I m sorry to ask but what s the current state of the art for Pants #general

I'm sorry to ask, but what's the current state of ...

average-breakfast-91545

09/09/2025, 2:20 PM

I'm sorry to ask, but what's the current state of the art for managing torch across architectures? Is there a decent example somewhere? I have some engineers running Linux with/without gpu, and some engineers running recent macs.

gorgeous-winter-99296

09/09/2025, 6:12 PM

We've been using the same approach for ~2 years or so, which is three resolves - one for CPU, one for GPU with pinned cuda, and one generic. We've tried alternatives when we get tired of the parametrization and lockfile management but this is the most stable for us.

average-breakfast-91545

09/09/2025, 6:32 PM

Makes sense - how does that work in practice? When you run, eg,

pants test

pants package

how are you selecting the correct resolve?

gorgeous-winter-99296

09/09/2025, 6:37 PM

Most commands just default to

base

[python].default-resolve

), so if you don't specify a parametrization that is what you get OOTB. It's also the only one that works on Mac, since the +cpu don't exist and GPU doesn't work (bar rocm, but we don't support that). Most of our serious work happens in our cloud system either way, so the

@parametrization=gpu

is mostly used in specific dev flows by our researchers and when building our containers. Our pre-commit etc also forces @cpu, primarily because it's much quicker when you can bypass all the cuda library packages, torch kernels, etc. Same with CI, they don't have CPUs. We also have flag aliases set up like

--with-cpu = "--python-default-resolve=cpu"

gorgeous-winter-99296

09/09/2025, 6:39 PM

The major caveat is that it's sometimes funky to generate lockfiles on mac... it's gotten better I think, but we've had issues with torch only declaring their platform dependencies in their platform wheels -- so a mac user doesn't see a conditional declare for cuda at all, and their lockfiles are broken... not sure if that's fixed, it's been a while since I had to fix it in our repo.

average-breakfast-91545

09/09/2025, 6:41 PM

Hmmmm. I've just gone through a process of separating resolves, so I now have one for inference, one for data pipelines, one for edge etc. seems like what I would need is to have multiple inference resolves and then work out some dx friendly way to select one.

gorgeous-winter-99296

09/09/2025, 6:46 PM

Oof; yeah. I've generally found things work better the fewer resolves we have; and this is the minimal I can get away with for any actual code we develop. We also make this work with aggressive use of parametrization + defaults... that has some spectacular sharp corners when you use named parametrizations. But this is pretty much the root declaration that makes it all work. This applies to our whole repository, except where we override it.

Copy code

__defaults__(
    {
        pex_binary: dict(execution_mode="venv", venv_site_packages_copies=True),
        (python_source, python_sources): dict(
            **parametrize("cpu", resolve="cpu", skip_pyright=True),
            **parametrize("gpu", resolve="gpu", skip_pyright=True),
            **parametrize("base", resolve="base"),
        ),
        python_distribution: dict(skip_twine=True),
    }
)

👍 1

average-breakfast-91545

09/09/2025, 6:50 PM

Yeah, I got to a point where I could no longer generate lock files or update things to get security patches, and where some weird ml dep was holding back our ability to adopt different tooling elsewhere. Hence the other thread :D

gorgeous-winter-99296

09/09/2025, 6:52 PM

Yeah -- I work in a purely RL team and we build almost purely on torch, so we can avoid that. I know some of the data and inference tools are gnarly complex to the point of having circular dependencies on each other.

☝️ 1

6 Views

Open in Slack

Previous Next