General Hardware Acceleration with CUDA - Reproducible Machine Learning Workflows for Scientists with Pixi

So far we’ve focused on machine learning examples, but CUDA hardware accelerated workflows extend far beyond AI/ML.

CuPy Example¶

Perhaps one of the most well known CUDA accelerated array programming libraries in CuPy, which is designed to have APIs that are highly compatible with NumPy and SciPy so that people can think in the common Scientific Python idioms while still leveraging CUDA.

Constructing the workspace¶

CuPy is distributed on PyPI and on conda-forge, so we can create a Pixi workspace that supports its CUDA requirements and then adds CuPy as well.

Construct the CuPy workspace

pixi.toml

[workspace]
channels = ["conda-forge"]
name = "cupy-example"
platforms = ["linux-64", "win-64"]
version = "0.1.0"

[tasks]

[dependencies]
python = ">=3.13.5,<3.14"
cupy = ">=13.4.1,<14"

[system-requirements]
cuda = "12"

Walkthrough if needed

Initialize the workspace

pixi init ~/reproducible-ml-scipy-2025/cupy-example
cd ~/reproducible-ml-scipy-2025/cupy-example

add all the platforms we’d like people to be able to develop for, even though this will be run on linux-64

pixi workspace platform add linux-64 win-64

and add the CUDA system requirements

pixi workspace system-requirements add cuda 12

Then add the CuPy dependencies

pixi add python cupy

✔ Added python >=3.13.5,<3.14
✔ Added cupy >=13.4.1,<14

and you should now have the workspace.

which gives us access to CuPy’s hardware acceleration, as shown in this example from the CuPy documentation

cupy-example.py

import numpy as np
import cupy as cp

# Array APIs are the same though operating on different hardware devices
x_cpu = np.array([1, 2, 3])
x_gpu = cp.array([1, 2, 3])

# Compute norms for both arrays
l2_cpu = np.linalg.norm(x_cpu)
l2_gpu = cp.linalg.norm(x_gpu)

print(f"NumPy array norm {l2_cpu} on device: {x_cpu.device}")
print(f"CuPy array norm {l2_gpu} on device: {x_gpu.device}")

pixi run python cupy-example.py

NumPy array 3.7416573867739413 on device: cpu
CuPy array 3.7416573867739413 on device: <CUDA Device 0>

CuDF Example¶

There are other CUDA accelerated libraries for scientific Python as well. NVIDIA has created the RAPIDS data science collection of libraries for running end-to-end data science pipelines fully on GPUs with CUDA. One of the libraries is CuDF — a high level Python library for manipulating DataFrames on the GPU with Pandas-like idioms.

Constructing the workspace¶

CuDF is not available on conda-forge, but it is available on the Python Package Index (PyPI) as cudf-cu12 and on the rapidsai conda channel on Anaconda.org as cudf. We can install it through either method, but to keep working with conda package, we’ll create a workspace that installs it from the rapdsai conda channel.

Construct the CuDF workspace

pixi.toml

[workspace]
channels = ["rapidsai", "conda-forge"]
name = "cudf-example"
platforms = ["linux-64"]
version = "0.1.0"

[tasks]

[dependencies]
cudf = ">=25.6.0,<26"
requests = ">=2.32.4,<3"
aiohttp = ">=3.12.13,<4"

[system-requirements]
cuda = "12"

Walkthrough if needed

Initialize the workspace

pixi init ~/reproducible-ml-scipy-2025/cudf-example
cd ~/reproducible-ml-scipy-2025/cudf-example

As CuDF is available as a conda package only for linux-64 we’ll just set that as the platform

pixi workspace platform add linux-64

and add the CUDA system requirements

pixi workspace system-requirements add cuda 12

and then add the rapidsai conda channel but note that for things to work we need it to have higher priority than conda-forge

pixi workspace channel add --prepend rapidsai

✔ Added rapidsai (https://conda.anaconda.org/rapidsai/)

Then add the CuDF dependencies for the target platform of linux-64 (requests and aiohttp are for an example).

pixi add cudf requests aiohttp

✔ Added cudf >=25.6.0,<26
✔ Added requests >=2.32.4,<3
✔ Added aiohttp >=3.12.13,<4

and you should now have the workspace.

From this code snippet from a user guide from NVIDIA, we can now see that CuDF has very similar semantics and API to Pandas

cudf-example.py

import pandas as pd
import cudf

# 1M Wikipedia pageview counts
data_url = "https://raw.githubusercontent.com/NVIDIA/accelerated-computing-hub/2186298825b85ef38f08e779af7992b8d762289f/gpu-python-tutorial/data/pageviews_small.csv"

# The semantics we know from Pandas
df_cpu = pd.read_csv(data_url, sep=" ")
print(f"Pandas DataFrame:\n {df_cpu.head()}")

# also exist with CuDF
df_gpu = cudf.read_csv(data_url, sep=" ")
print(f"\nCuDF DataFrame:\n {df_gpu.head()}")

# Label columns & drop unused column
df_gpu.columns = ["project", "page", "requests", "x"]
df_gpu = df_gpu.drop("x", axis=1)

# Count number of English pages
print(f"\n# of English:\n {df_gpu[df_gpu.project == 'en'].count()}")

pixi run python cudf-example.py

Pandas DataFrame
:    en.m                   Article_51  1  0
0    ja                       エレファモン  1  0
1   ang                Flocc:Scīrung  1  0
2    en  Panorama_(La_Dispute_album)  1  0
3  fa.m                  جاشوا_جکسون  1  0
4  fa.m                 خانواده_کندی  2  0
[1297577][09:12:17:950636][warning] Auto detection of compression type is supported only for file type buffers. For other buffer types, AUTO compression type assumes uncompressed input.

CuDF DataFrame
:    en.m                   Article_51  1  0
0    ja                       エレファモン  1  0
1   ang                Flocc:Scīrung  1  0
2    en  Panorama_(La_Dispute_album)  1  0
3  fa.m                  جاشوا_جکسون  1  0
4  fa.m                 خانواده_کندی  2  0

For time today, we won’t cover CuDF fully, but there are user guides for how to use CuDF, as seen below.

Reproducible Machine Learning Workflows for Scientists with Pixi

Machine Learning workflows in a Pixi workspace

Reproducible Machine Learning Workflows for Scientists with Pixi

Deploying Pixi environments with Linux containers