I'm doing some work with pytorch. Sometimes I need...
# general
p
I'm doing some work with pytorch. Sometimes I need to depend on the version of PyTorch that's CPU-only and sometimes I need GPU support (depending on if I'm building for a computer with or without a GPU). The dependencies are teh same in both cases; the only thing that's different is the
--extra-index-url
. For example, to run on a CPU you'd do
pip3 install torch --extra-index-url <https://download.pytorch.org/whl/cpu>
but for a GPU you'd do
pip3 install torch --extra-index-url <https://download.pytorch.org/whl/cu116>
. I know how to add an extra index URL to
[python-repos]
in
pants.toml
but I have no idea how to do that conditionally. Any ideas?
b
So this touches on something that's hard (impossible?) To do in pants because it's hard/impossible to do in the world of Python+lockfiles. To answer your specific question every option can be specified in the config, as a flag, or as an emv variable. So you could do
--python-repos-foo-bar
Are you using a lockfile?
I'm going to be in the same boat very soon (by EoY) so by then I promise there'll be some kind of hack 🤪
p
I thought that might be the case. FWIW, it seems really odd to have the differentiator be the index URL and not the package name.
😞
the only thing I can think of is some hacky woraround with a wrapper script that generates a
pants.toml
, symlinks out the
3rdParty
directory so the lock files change, etc.
b
Yeah. I like how
mxnet
changes the package name, but has the same module name. That makes things so much easier
You can make it work using multiple lockfiles, each associated with a Pants resolve, and then declaring all your code compatible with both resolves But that's a lot of plumbing just to switch to GPU
p
hmm.. Not clear on how the multiple resolves bit would work. Let's say I have a library that I wrote that uses pytorch. That library has to depend on a single resolve, right? If I could change the resolve dynamically via a CLI argument or something that seems like it'd work.
but I haven't used the multiple resolves feature so not too familiar...
b
You can use
resolve=parameterize("resolveA", "resolveB")
which then makes your library have 2 targets, one for each resolve. Spread that over your whole codebase and lastly declare your final application with only the resolve you want and presto!
Alternatively you might be able to not do all that, and try fiddling with the "default" resolve
p
ug. Yeah, that seems like a lot of plumbing... Thanks for the suggestion though! I might end up having to do that. Or maybe I can just install CUDA libs on devices that don't have a GPU and always use the GPU version of pytorch but don't actually use a GPU when it doesn't exist.
b
If you can stomach the giant downloads, that's another option 🙂
p
@bitter-ability-32190 it actually looks like I can just always install the CUDA version and if you never try to use the GPU you don't need CUDA installed and everything should just work: https://discuss.pytorch.org/t/is-it-required-to-set-up-cuda-on-pc-before-installing-cuda-enabled-pytorch/60181
b
Yeah but the gpu-enabled pytorch is still huge
p
oh, yeah. That doesn't worry me too much. Not great, but not a huge problem.
Thanks!
b
Might have to go that route though 🫤
Hello from the future. I actually found a great way to solve this! I'll write a blog post in the next month or so
LMK if you want a TLDR
p
Yes please @bitter-ability-32190!!
b
I'll DM tomorrow
p
thank!!
b
Hi @bitter-ability-32190. Could you post the TLDR (or a link to the blog post) here, for people who find this thread because they have the same problem, please? I can't find a post by you in Pants' blog matching the date and topic. Or where you talking about environments? Thanks in advance.
👍 1
b
I never did get to that blog post. I'm actually hoping to turn the whole thing upside down and make this really easy in Pants itself.
So my TL;DR: • Instead of using Pants to generate the lockfile,I use pex itself. I generate the lockfile using vanilla torch • Then I dump that lockfiles pinned requirements and swap the torch version with the cuda one • Relock (because everything is pinned I know every package will have the same version) • Now in pants land, I have a resolve for each lockfile, vanilla torch is the default • Whenever you want cuda torch, I use acmdline flag to set the default resolve to the cuda one
(I submitted a talk to SciPy about Pants+ML with the thought that if I get picked I'll be forced to make this nicer)