Hm, thought this was obvious but while I've found ...
# general
e
Hm, thought this was obvious but while I've found a workaround I'd like clarification. We use one package,
jax
, that needs some extras in most contexts (
jax[cpu]
) and in one particular context needs different extras (
jax[mystical_cuda_requirements]
). Honestly this should be true for
torch
as well but the cuda/cpu builds there are hellacious (and this is why we can't have nice things like sub-2GB containers). Anyway. We have pinned versions of packages in a constraints file, but at least a while ago you couldn't put extras in there (haven't tried again under the new resolvers). I generate a
python_requirements
based on that constraints.txt file, but if I have
jax
in the constraints file, it creates a target without any extras which does no one any good. I can't figure out how to have extras in a
python_requirements
rule, so I have a separate requirement as
Copy code
python_requirement(
    name="jax", requirements=["jax[cpu]==0.3.10"], modules=["jax", "jaxlib"]
)
...and my guess for when I need the cuda version is to create a
name="jaxcuda"
which also provides the same modules and resolve in the necessary target. Maybe. Is there a clean way to handle this sort of situation? I scanned the documentation but got mildly confused... I'd really prefer a "use the CPU unless specifically declared otherwise" rather than two separate "jax" targets which have to be resolved for every client target if we can...
b
Are you able to upgrade to Pants 2.11 and try out the PEX-based lockfiles? I can confirm they handle extras.
e
Hm... am using 2.11 now actually. Excellent and missed that in the patch notes. Will give it a go shortly.
Yep, that did the trick; many thanks. Any suggestions on the "changed extra" there when I need the cuda requirements in only some cases? Do I make a secondary target with a separate
python_requirement
and list that in the dependencies for the one target which needs cuda?
b
(The PEX lockfiles are also much faster to consume, enjoy the extra perf 😉 )
Any suggestions on the "changed extra" there when I need the cuda requirements in only some cases?
I'm told that's a key feature for
paramterize
but admittedly I haven't put all the pieces together in my head.
e
That performance part has been nicer, yes. We were actually already using the pex resolver, but with a "cleaned" constraints.txt that didn't have the extras. I'll see if I understand parameterize; the description doesn't quite match what I need (although I don't need it yet)
b
Yeah let me ping @hundreds-father-404 they're wiser than me here
e
ah well, damn; I forgot that later on in the cycle we install via
pip ... -c constraints.txt
and so pip still doesn't handle that. Eh, I'll do my workaround for now (I can't use pexes for docker builds because some of my folks use macs and I don't have the bandwidth right now to generate requirements.txt files via pants for each dockerfile to use). Thanks, though, will make notes that this is handled.
b
PEX itself allows you to transform to/from
.txt
pip-style files.
pex3 lock create
I think
e
Fair and good to know!
h
The solution for toggling between GPU vs CPU is not super fun 😞 especially if you want to change everything at a global level, rather than per-binary/per-project basis Naively, this is where "multiple resolves" (aka lockfiles) comes in. You will have two resolves that are identical in every way except for CPU vs GPU. Then, every target that should work with both will look like this:
Copy code
python_sources(
   resolve=parametrize("python-gpu", "python-cpu"),
)
-- There is another workaround which I honestly might suggest you do...just thought of this one. In your BUILD file, have a
python_requirement
target for the CPU version and a different one for GPU. Comment out whichever one you are not using You can maintain two lockfiles in a kind of hacky way, update
[python].resolves
to point to
python.cpu.lock
vs
python.gpu.lock
, for example. When you want to change, you'll comment out the target you don't want, and update
[python].resolves
in
pants.toml
There are some ways we could make that slightly less hacky, like you write a target generator that will inspect
[python].resolves
option and decide based on that whether to give you the CPU or GPU version I don't love how hacky this all is, but might honestly be better than multiple resolves. Multiple resolves are super powerful, but do have additional cognitive overhead https://www.pantsbuild.org/docs/python-third-party-dependencies#multiple-lockfiles
e
Interesting. Multiple resolves ARE super powerful. I actually don't need a "global switch" -- what I'm wanting to do is have a "global default except for this dockerfile which uses the GPU version" idea. Since I'm not actually generating the Dockerfiles through Pants (ie not using the pex mechanism) I'm probably going to brute-force this by having a line in the dockerfile which just installs the GPU version, which will work if we need it--good to know what other options are available. What I was trying to avoid is "two different jax target definitions and now every time something needs jax you need to specify which one"
2
h
"two different jax target definitions and now every time something needs jax you need to specify which one"
That is a major benefit of multiple resolves: no more "ambiguous dependency" warnings! Pants only infers deps on things that share the same resolve But yeah, users would need to mark whether something is
resolve=parametrize("cpu", "gpu")
vs just one of those two, which is a pain. (My priority this week is improving error message for that, at least)