Loading…
In-person + Virtual
October 11-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2021 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Pacific Daylight Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Friday, October 15 • 11:00am - 11:35am
Scaling Machine Learning Workflows to Big Data with Fugue - Kevin Kho, Prefect & Han Wang, Lyft

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Data scientists often use Pandas for data that fits on a single machine, and Spark or Dask for larger datasets that need distributed computing power. What happens though, when the data starts small and then grows too much for Pandas to handle? Data scientists often find themselves reimplementing the same code to transition to Spark. Even code with the same business logic needs two separate implementations. Fugue is an open-source abstraction layer that solves this. In this talk, he'll show how Fugue lets users port native Python code to Spark or Dask with minimal code changes. By using Fugue, data science code will be written in a framework-agnostic and scale-agnostic manner that allows it to be ported to different execution environments. This will be demonstrated by showing how to scale data compute from a single machine to a Spark cluster set-up on Kubernetes.

Speakers
avatar for Kevin Kho

Kevin Kho

Open Source Community Engineer, Prefect
Kevin Kho is an Open Source Community Engineer at Prefect, an open-source workflow orchestration management framework. Outside of work, he is a contributor for Fugue, an abstraction layer for distributed compute. He also organizes the Orlando Machine Learning and Data Science Meetup... Read More →
avatar for Han Wang

Han Wang

Machine Learning Engineer, Lyft
Han Wang is a staff Machine Learning Engineer at Lyft and author of the Fugue package.



Friday October 15, 2021 11:00am - 11:35am PDT
411 Theater + Online