Reproducible Machine Learning Workflows for Scientists with Pixi - Reproducible Machine Learning Workflows for Scientists with Pixi

Abstract¶

Scientific researchers need reproducible software environments for complex applications that can run across heterogeneous computing platforms. Modern open source tools, like Pixi, provide automatic reproducibility solutions for all dependencies while providing a high level interface well suited for researchers.

This tutorial will provide a practical introduction to using Pixi to easily create scientific and AI/ML environments that benefit from hardware acceleration, across multiple machines and platforms. The focus will be on applications using the PyTorch and JAX Python machine learning libraries with CUDA enabled, as well as deploying these environments to production settings in Linux container images.

Keywords:reproduciblemachine learningpixipythonscipy¶

Taught at SciPy 2025 as a tutorial on Monday July 7th, 2025

SciPy Logistical Information¶

Tutorial name: Reproducible Machine Learning Workflows for Scientists with Pixi
Date: 2025-07-07
Time: 13:30 to 17:30
Location: Ballroom C (Greater Tacoma Convention Center, 3rd Floor, 1500 Commerce St.)

Rough Outline¶

00:00 – 00:05 (5 min):

Personal Introductions

00:05 – 00:15 (10 min):

Setup instructions, setup your machine for the tutorial.

00:15 – 00:30 (15 min):

Introduction to Philosophy, an overview of the philosophy behind this tutorial.

00:30 – 01:00 (30 min):

Pixi introduction, an overview of Pixi’s features and capabilities.

01:00 – 01:40 (40 min):

Pixi exercises, play around with Pixi and create a reproducible Python environment

01:40 – 01:55 (15 min):

Break, grab a snack and stretch your legs.

01:55 – 02:35 (40 min):

Introduction to CUDA and CUDA conda packages, the history and overview of how to use CUDA with Pixi and conda packages.

02:35 – 02:45 (10 min):

Break, grab a snack and stretch your legs.

02:45 – 03:10 (25 min):

Intro to Machine Learning applications with Pixi, an overview of how to use Pixi for machine learning applications, including PyTorch.

03:10 – 03:30 (20 min):

General hardware acceleration using CUDA, an overview of how to use CUDA for general hardware acceleration like replacing NumPy with CuPy.

03:30 – 04:00 (30 min):

Deployment using Linux containers, an overview of how to use Pixi to create reproducible Linux CUDA container images.

This tutorial was supported by the US Research Software Sustainability Institute (URSSI) via grant G-2022-19347 from the Sloan Foundation, prefix.dev GmbH, NVIDIA, and the University of Wisconsin–Madison Data Science Institute.

References¶

Matthew Feickert, Ruben Arts, & John Kirkham. (2025). matthewfeickert-talks/reproducible-ml-for-scientists-with-pixi-scipy-2025: SciPy 2025 Tutorial. Zenodo. 10.5281/ZENODO.16320203

Reproducible Machine Learning Workflows for Scientists with Pixi

Setup