Machine Learning workflows in a Pixi workspace
Now that we know how to build a CUDA Pixi environment in our example workspace and NVIDIA GPUs on Brev, let’s run an example machine learning workflow using our Pixi workspace.
Training the machine learning model¶
Let’s write a very standard tutorial example of training a deep neral network on the MNIST dataset with PyTorch and then run it on the GPUs on Brev.
The neural network code¶
We’ll download Python code that uses a convocational neural network written in PyTorch to learn to identify the handwritten number of the MNIST dataset and place it under a src/
directory.
This is a modified example from the PyTorch documentation (mnist/main.py) which is licensed under the BSD 3-Clause license.
curl -sLO https://raw.githubusercontent.com/matthewfeickert/nvidia-gpu-ml-library-test/36c725360b1b1db648d6955c27bd3885b29a3273/torch_MNIST.py
mkdir -p src
mv torch_MNIST.py src/
The Pixi environment¶
Now let’s think about what we need to use this code.
Looking at the imports of src/torch_MNIST.py
we can see that torch
and torchvision
are the only imported libraries that aren’t part of the Python standard library, so we will need to depend on PyTorch and torchvision
.
We also know that we’d like to use CUDA accelerated code, so that we’ll need CUDA libraries and versions of PyTorch that support CUDA.
Now that we have the environments solved, let’s do a comparison of training in the CPU environment and the GPU environment.
To validate that things are working with the CPU code, let’s do a short training run for only 2 epochs in the cpu
environment.
pixi run --environment cpu python src/torch_MNIST.py --epochs 2 --save-model --data-dir data
100.0%
100.0%
100.0%
100.0%
Train Epoch: 1 [0/60000 (0%)] Loss: 2.329474
Train Epoch: 1 [640/60000 (1%)] Loss: 1.425185
Train Epoch: 1 [1280/60000 (2%)] Loss: 0.826808
Train Epoch: 1 [1920/60000 (3%)] Loss: 0.556883
Train Epoch: 1 [2560/60000 (4%)] Loss: 0.483756
...
Train Epoch: 2 [57600/60000 (96%)] Loss: 0.146226
Train Epoch: 2 [58240/60000 (97%)] Loss: 0.016065
Train Epoch: 2 [58880/60000 (98%)] Loss: 0.003342
Train Epoch: 2 [59520/60000 (99%)] Loss: 0.001542
Test set: Average loss: 0.0351, Accuracy: 9874/10000 (99%)
That took some time and we only got 2 epochs into training, but it ran!
Let’s speed things up using the gpu
environment.
pixi run --environment gpu python src/torch_MNIST.py --epochs 14 --save-model --data-dir data
Train Epoch: 1 [0/60000 (0%)] Loss: 2.281690
Train Epoch: 1 [640/60000 (1%)] Loss: 1.459567
Train Epoch: 1 [1280/60000 (2%)] Loss: 0.927929
Train Epoch: 1 [1920/60000 (3%)] Loss: 0.632228
Train Epoch: 1 [2560/60000 (4%)] Loss: 0.384857
...
Train Epoch: 14 [56960/60000 (95%)] Loss: 0.009351
Train Epoch: 14 [57600/60000 (96%)] Loss: 0.001419
Train Epoch: 14 [58240/60000 (97%)] Loss: 0.024142
Train Epoch: 14 [58880/60000 (98%)] Loss: 0.004241
Train Epoch: 14 [59520/60000 (99%)] Loss: 0.003314
Test set: Average loss: 0.0268, Accuracy: 9919/10000 (99%)
Add train-cpu
and train-gpu
tasks to the Pixi workspace
Solution
Let’s add the train-cpu
task first with pixi task add
pixi task add --feature cpu --description "Train MNIST on CPU" train-cpu "python src/torch_MNIST.py --epochs 2 --save-model --data-dir data"
✔ Added task `train-cpu`: python src/torch_MNIST.py --epochs 2 --save-model --data-dir data, description = "Train MNIST on CPU"
and then do the same with the train-gpu
task
pixi task add --feature gpu --description "Train MNIST on GPU" train-gpu "python src/torch_MNIST.py --epochs 14 --save-model --data-dir data"
✔ Added task `train-gpu`: python src/torch_MNIST.py --epochs 14 --save-model --data-dir data, description = "Train MNIST on GPU"
We can now get a nice summary of the tasks available with pixi task list
pixi task list
Tasks that can run on this machine:
-----------------------------------
train-cpu, train-gpu
Task Description
train-cpu Train MNIST on CPU
train-gpu Train MNIST on GPU
[workspace]
channels = ["conda-forge"]
name = "ml-example"
platforms = ["linux-64", "osx-arm64", "win-64"]
version = "0.1.0"
[tasks]
[dependencies]
python = ">=3.13.5,<3.14"
[feature.cpu.dependencies]
pytorch-cpu = ">=2.7.1,<3"
torchvision = ">=0.22.0,<0.23"
[feature.cpu.tasks]
train-cpu = { cmd = "python src/torch_MNIST.py --epochs 2 --save-model --data-dir data", description = "Train MNIST on CPU" }
[feature.gpu.system-requirements]
cuda = "12"
[feature.gpu.target.linux-64.dependencies]
pytorch-gpu = ">=2.7.1,<3"
torchvision = ">=0.22.0,<0.23"
[feature.gpu.tasks]
train-gpu = { cmd = "python src/torch_MNIST.py --epochs 14 --save-model --data-dir data", description = "Train MNIST on GPU" }
[environments]
cpu = ["cpu"]
gpu = ["gpu"]
You’ll note that using the pixi task
CLI API adds the tasks to the feature tasks
subtable but places all of the task
components (e.g. cmd
, description
) on a single line.
It can sometimes be visually cleaner to give each task its own subtable, where
[feature.gpu.tasks]
train-gpu = { cmd = "python src/torch_MNIST.py --epochs 14 --save-model --data-dir data", description = "Train MNIST on GPU" }
can be rewritten as
[feature.gpu.tasks.train-gpu]
description = "Train MNIST on GPU"
cmd = "python src/torch_MNIST.py --epochs 14 --save-model --data-dir data"
[workspace]
channels = ["conda-forge"]
name = "ml-example"
platforms = ["linux-64", "osx-arm64", "win-64"]
version = "0.1.0"
[tasks]
[dependencies]
python = ">=3.13.5,<3.14"
[feature.cpu.dependencies]
pytorch-cpu = ">=2.7.1,<3"
torchvision = ">=0.22.0,<0.23"
[feature.cpu.tasks.train-cpu]
description = "Train MNIST on CPU"
cmd = "python src/torch_MNIST.py --epochs 2 --save-model --data-dir data"
[feature.gpu.system-requirements]
cuda = "12"
[feature.gpu.target.linux-64.dependencies]
pytorch-gpu = ">=2.7.1,<3"
torchvision = ">=0.22.0,<0.23"
[feature.gpu.tasks.train-gpu]
description = "Train MNIST on GPU"
cmd = "python src/torch_MNIST.py --epochs 14 --save-model --data-dir data"
[environments]
cpu = ["cpu"]
gpu = ["gpu"]
When running tasks that only exist in one environment, Pixi will automatically use that environment.
No need to specify the --environment
flag when running the task.
pixi run train-gpu
✨ Pixi task (train-gpu in gpu): python src/torch_MNIST.py --epochs 14 --save-model --data-dir data: (Train MNIST on GPU)
Instead of:
pixi run --environment gpu train-gpu
Performing inference with the trained model¶
Now that we’ve trained our model we’d like to be able to use it to perform machine learning inference (model prediction). However, we might want to perform inference in a different software environment than the environment we used for model training.
We’ll download Python code under the src/
directory that uses the same PyTorch convocational neural network architecture in torch_MNIST.py
to load the model and an image and predict what number the image contains.
This code is licensed under the MIT license.
curl -sLO https://raw.githubusercontent.com/matthewfeickert/nvidia-gpu-ml-library-test/36c725360b1b1db648d6955c27bd3885b29a3273/torch_MNIST_inference.py
mkdir -p src
mv torch_MNIST_inference.py src/
Add an inference
environment that uses the gpu
feature and an inference
feature
Solution
Add additional dependencies that we’ll want for use in inference environments
pixi add --feature inference matplotlib
✔ Added matplotlib
Added these only for feature: inference
and add the gpu
and inference
features to an inference
environment
pixi workspace environment add --feature gpu --feature inference inference
✔ Added environment inference
and then instantiate the feature with specific versions
pixi upgrade --feature inference
[workspace]
channels = ["conda-forge"]
name = "ml-example"
platforms = ["linux-64", "osx-arm64", "win-64"]
version = "0.1.0"
[tasks]
[dependencies]
python = ">=3.13.5,<3.14"
[feature.cpu.dependencies]
pytorch-cpu = ">=2.7.1,<3"
torchvision = ">=0.22.0,<0.23"
[feature.cpu.tasks.train-cpu]
description = "Train MNIST on CPU"
cmd = "python src/torch_MNIST.py --epochs 2 --save-model --data-dir data"
[feature.gpu.system-requirements]
cuda = "12"
[feature.gpu.target.linux-64.dependencies]
pytorch-gpu = ">=2.7.1,<3"
torchvision = ">=0.22.0,<0.23"
[feature.gpu.tasks.train-gpu]
description = "Train MNIST on GPU"
cmd = "python src/torch_MNIST.py --epochs 14 --save-model --data-dir data"
[feature.inference.dependencies]
matplotlib = ">=3.10.3,<4"
[environments]
cpu = ["cpu"]
gpu = ["gpu"]
inference = ["gpu", "inference"]
Now that we’ve added an inference
environment to the workspace, use it to predict the value of the default image.
pixi run --environment inference python src/torch_MNIST_inference.py --model-path ./mnist_cnn.pt --image-path ./test_image.png
Label: 4, Prediction: 4
As we didn’t have an image yet to run on, we loaded one from the MNIST training set, and as we know the labels there we include the label output. If we load the image from disk without knowing this, then we get just the prediction values.
pixi run --environment inference python src/torch_MNIST_inference.py --model-path ./mnist_cnn.pt --image-path ./test_image.png
Prediction: 4
In the real world you’re probably not writing a Python script from scratch to perform ML inference, but using a tool like NVIDIA Triton Inference Server. However, for this tutorial it is fine to do this more explicit, but less useful, example.