Visit
Mon 02/24
Mark Zhao headshot

Seminar @ Cornell Tech: CS Candidate Mark Zhao

Thinking Outside the GPU: Systems for Scalable Machine Learning Pipelines

Scalable and efficient machine learning (ML) systems have been instrumental in fueling recent advancements in ML capabilities. However, further scaling these systems requires more than simply increasing the performance and quantity of accelerators such as GPUs. This is because modern ML deployments rely on complex pipelines composed of many diverse and interconnected systems, not just accelerators.

In this talk, Zhao will emphasize the importance of building scalable systems across the entire ML pipeline. In particular, Zhao will explore how large-scale ML training pipelines, including those deployed at Meta, require distributed data storage and ingestion systems to manage massive training datasets. Optimizing these data systems is essential as data demands continue to grow. To achieve this, Zhao will demonstrate how synergistic optimizations across the training data pipeline can unlock performance and efficiency gains beyond what isolated system optimizations can achieve. While these synergistic optimizations are critical, deploying them requires navigating a large system design space. To address this challenge, Zhao will next introduce cedar, a framework that automates the optimization and orchestration of ML data processing for diverse training workloads. Finally, Zhao will discuss further opportunities in advancing the scalability, security, and capabilities of the hardware and software systems that continue to drive increasingly sophisticated ML training and inference pipelines.

Speaker Bio

Mark is a final-year Ph.D. candidate at Stanford University, where he is advised by Christos Kozyrakis. His research builds systems for end-to-end machine learning deployments by leveraging tools across the computing stack, including computer systems, computer architecture, security, databases, and machine learning. He has received an IEEE S&P Distinguished Practical Paper Award, a Top Pick in Hardware and Embedded Security Award, and an MLCommons ML and Systems Rising Star Award. His work is generously supported by a Stanford Graduate Fellowship and a Meta Ph.D. Fellowship in AI System SW/HW Co-Design. His website is at https://web.stanford.edu/~myzhao/.