Can I use different set of custom PIP indexes for ...
# general
g
Can I use different set of custom PIP indexes for different resolves? Pants/Pex doesn't generate a portable lockfile if we include an index in our primary lockfile; but we need it for other more niche applications and workflows.
👀 1
(This is not anything Pex does wrong, to be clear. Just torch wheel publishing breaking PEPs ❤️)
👀 1
Essentially what happens is that we want to use multiple torch accelerator variants - cuda 11.7, and CPU-only for testing. Both use local versions, and local versions shall be preferred. However, while there is a CPU-only wheel for MacOS in the index it does not have the +CPU tag, so we cannot install on Mac when we have the index added. But for testing and docker images we want CPU-only wherever possible as it saves ~800 MB per docker image.
So we need to have three resolves: "only PIP", "only CPU but only Linux/Windows" and "with cuda". And they need to use different resolves and different indexes.
An alternative would be to handle env markers in Pants/Pex but from reading it is intentionally not possible.
b
Oh hey we do something similar
Although my way is complicated. • I have a docker image I install python and friends into • Install Pex, copy the requirements file • Use pip to fully resolve the requirements • Pass the whole list into a PEX lock create command (this is the CPU lockfiles) • Add the index for torch,
sed
the version to the CUDA one, relock • 🎉
g
Hmm, but how do you handle local work? I.e., some users on Mac, some users on Linux needing cuda support?
b
I don't. CUDA support is enabled by switching the default resolve to the CUDA one. We also dont' support Mac in our repo You can coerce Pex to lock for both Linux/Mac, and use that as the default.
g
Ah, yeah. Mac support is the crunch here. Locking with anything with a local tag makes the lockfile not work on Mac. And just having those local tags in an index makes them get picked.
b
So it sounds like you have 2 problems then? Linx/Mac and CPU/GPU?
g
They interact. If I only care about Linux I can have local tags in two different resolves and it works. But that prevents it from working on Mac, ever. Just having the index is enough to break that. And then if I use the PyPi it works on Linux/Mac, but then the torch version ends up not supporting the cards we use for training locally (3090/sm86).
b
Wait, then yeah you can follow the steps above. The Linux/Mac resolve uses PyPI. The Linux+GPU resolve uses PyPI+Torch index
g
Hmm, maybe I'm misunderstanding it. Are your steps just instead of
pants generate-lockfiles
?
b
Yup!
Basically, I want two lockfiles. And I want them to be completely identical EXCEPT for the torch package. I think what you could do is: • Just run
generate-lockfiles
-> Linux/Mac CPU lockfile • Take the list of packages with pinned versions from that lockfile (choose your favorite way to do this) • Pass that into a
pex lock create
with the
torch
version pinned to CUDA, and the torch PyPI as an addittional index _> Linux GPU lockfile
The important thing is your GPU lockfile has EXACTLY the same deps of everything (but torch)
g
I wonder if this would work...
Copy code
pants generate-lockfiles --resolve=cpu
pants generate-lockfiles --python-indexes='["<http://pypi.org/simple|pypi.org/simple>", "<http://download.torch.org/cu117|download.torch.org/cu117>"]' --resolve=gpu
b
pants generate-lockfiles
tries to do a
universal
lock 🤮
g
And having those being the same reqs file but two different
torch==...
statements
b
Also if a package gets updated on PyPI in between those calls, your versions might differ
The important thing is your GPU lockfile has EXACTLY the same deps of everything (but torch)
☝️
Pants could do a better job of both of these things. You're welcome to open ticket(s)! 🙂
g
It's only important from a "sanity" standpoint, right? As in, I could potentially be running different code for CPU and GPU, but in a good scenario it doesn't matter if cloudpickle 0.13.1 vs 0.13.2 is used. But a pain to debug if there is a minor regression.
b
Yeah. I can't in good conscience let them float for my org. Either way I think you'll get bit by the universal lockfiles for the second command. Try it though and see what happens
Also, to not use 2 requirements files, you could
cat
the one and then
sed
it
g
I use one already, the other requirements I add via build file. 😛
🙌 1
a
@gorgeous-winter-99296, hey, you could take a look how I handle it: https://github.com/pantsbuild/pants/issues/18293#issuecomment-1439217661
oh, I see, you participated in that thread 🙂
g
Yeah; I've considered something like that. Just a pain to manage private mirrors... one more step for onboarding, etc. Permissions. Maintenance.
This thread is a related too; https://pantsbuild.slack.com/archives/C046T6T9U/p1661326739624639. Unfortunately since the
+cpu
(or some other!) gets picked for local tags, it still doesn't work for Mac. Otherwise using markers would be interesting.
I.e.,
Copy code
torch==1.11.0+cpu ; platform!="darwin"
torch==1.11.0 ; platform=="darwin"
will always use +cpu anyways, since they're meant to be compatible. Just that there's no 1.11.0+cpu for Mac.
(tbh this seems like a bug to me since +cpu doesn't even exist on darwin, but my experience shows that this is what Pex/pants does. Likely because of the universal style of pex lock; cc @enough-analyst-54434 can maybe confirm)
b
IIRC the python spec says that local tags are just sugar and can be ignored. Or something to that effect
Nope I'm mistaken. I was thinking of this:
Local version identifiers SHOULD NOT be used when publishing upstream projects to a public index server,
g
Yepp, but also:
Local versions sort differently, this PEP requires that they sort as greater than the same version without a local
Which is why +cpu gets picked.
b
e
@gorgeous-winter-99296 to your question, if there's a bug there it's in Pex not refusing to lock. A fundamental constraint in any style lock that Pex generates is it must pick exactly 1 version for each project in the lock. This is a limitation of the lock mechanism, which is delegation to Pip: it's just trying to pick a single version like it always does.
So, Pex definitely supports environment markers, it does not support version bifurcation.
FWICT torch should be using extras, say [CPU] or [GPU] to accomplish this. That would require they change how they package a bit and it's likely a sailed ship.
b
FWICT torch should be using extras, say [CPU] or [GPU] to accomplish this. That would require they change how they package a bit and it's likely a sailed ship.
Yeah that's my hot take as well. I forgot to mention it to the Metas folks I met at PyCon 😕
g
Yeah. It's not like we have much leverage as external users. It's not like tensorflow has much better packaging...
b
mxnet
uses different package names. So that's a pro here, but then there's other cons 😛
g
@enough-analyst-54434 to expand a bit on "is there a bug here", I specify a requirement like
Copy code
torch==1.11.0
and repos like:
Copy code
[python-repos]
indexes = [
        "<https://pypi.org/simple/>",
        "<https://download.pytorch.org/whl/cpu/>"
]
and do a lock via pants and get
Copy code
{
  "artifacts": [
    {
      "algorithm": "sha256",
      "hash": "544c13ef120531ec2f28a3c858c06e600d514a6dfe09b4dd6fd0262088dd2fa3",
      "url": "<https://download.pytorch.org/whl/cpu/torch-1.11.0%2Bcpu-cp39-cp39-linux_x86_64.whl>"
    }
  ],
  "project_name": "torch",
  "requires_dists": [
    "typing-extensions"
  ],
  "requires_python": ">=3.7.0",
  "version": "1.11.0+cpu"
}
Which seems... like the wrong thing to do. So it's locking a different, actually incompatible version than the one specified. If it was truly universal there I'd expect it to fallback to the pypi version, which would work for both.
OK; so. I think I have a working solution. I ended up with three locks: default, cpu, gpu. They all generate succesfully from a
pants generate-lockfiles
. The indexes are always in the pants.toml. The way I did it was to add three extra requirements for the torch dependencies, one per resolve.
Copy code
python_requirement(
    name="torch",
    requirements=["torch==1.11.0,!=1.11.0+cpu,!=1.11.0+cu115"],
    resolve="reqs",
)

python_requirement(
    name="torch_cpu",
    requirements=["torch==1.11.0+cpu"],
    resolve="cpu",
)

python_requirement(
    name="torch_gpu",
    requirements=["torch==1.11.0+cu115"],
    resolve="gpu",
)
I've not dug deeper but this at least lets me get three correct resolves and at least one that works on Mac.
b
Ah yeah, clever!
a
@gorgeous-winter-99296 what’s next with these resolves, how do you use them? For example, I have a resolver data-science for ml specific and it has inside
torch
. How do you accomplish to use right resolver in CI and right on mac/linux users?
g
That's tomorrows problem. With lots of pain, I imagine. I'll report back!
👍 1
a
Thank you very much, if it works for you, then I think it would be great to see your report in that GitHub issue
e
@gorgeous-winter-99296 re torch==1.11.0 selecting 1.11.0+X, this is just Pip following the rules: https://peps.python.org/pep-0440/#version-matching towards the bottom you'll find: "If the specified version identifier is a public version identifier (no local version label), then the local version label of any candidate versions MUST be ignored when matching versions."
This is why I say the torch people seem to have misjudged when adopting re-use of local version identifiers for selecting between different underlying implementations. Not a good choice for several reasons.
g
But isn't the requirement of a universal lock that it works on all targeted platforms, i.e. Linux and Mac when using Pants? So the choice isn't valid.
From this quote in the other thread:
The universal mode creates 1 lock that must work for the complete range of interpreters and machines implied by
--interpreter-constraint
and any
--target-system
specified (Pants always passes
--target-system mac --target-system linux
).
e
Ah, I was just focused on the version not the artifact. I'm AFK through the 8th but can try to repro your example and discover more. What Pip version are you using? Basically if you could provide the lock from one of the 3 successful examples above, that would help gather relevant details in 1 place.
g
I'll see if I can split this out in a separate repository!
e
Ok, I think the issue here is my mis-portrayal of
--target-system
. That just limits what artifacts Pex locks. I.E: It does not try to lock Windows artifacts. It locks any Linux and Mac artifacts available though. This does not mean it somehow tests the locks will work on Linux and Mac. In fact, universal locks are conceptually broken this way in general. The "universal" claim is best effort. If your universe includes, say, Mac arm and there are no wheels released for Mac arm for some set of your transitive requirements, the end result is yolo. Who knows, some of those sdists may never build on Mac arm and the lock never work for those. So, if a lock grabs some pin - all this torch shenanigans with local versions aside - and that pin has just a - say - Linux x84_64 wheel available with no sdist, the universal lock will succeed.
Right now the only way to "test" a universal lock is to pass it a list of
--platform
or
--omplete-platform
. In that case Pex would still perform the lock as-is, but then attempt to resolve a subset of the lock for each supplied platform and fail if it could not.
At that point though, you've enumerated all your target platforms. And, at that point, you probably shouldn't be using a universal lock, but a multi-platform lock, which Pex supports but Pants does not.
@gorgeous-winter-99296 when you get back on line, let me know what you think about all this. It is as it is, but you seem to have a good grasp of the playing field; so your feedback is useful.
g
Right. So it's a maximum set; not a minimum set. That makes sense, as otherwise adding
+cpu
wouldn't be lockable instead. If I understand correctly, adding
--platform
or the complete dito would still only be validation, right? It wouldn't actually constrain the resolve process. I know from attempting to work around this issue in both PDM and Poetry that it's equally borked everywhere; and the hacks are equally ugly. If I understand you correctly; if Pants would support a multi-platform lock this would be solvable (albeit maybe quite complex). Is there a philosophical reason Pants does not support this, or just no-one having been hurt enough by it to fix it? It doesn't seem like an insurmountable contribution to make if it'd be accepted. OTOH; I now have a documented workable approach that'll continue working for the foreseeable future. I expect though that when we upgrade torch to 1.13 or 2.0 when it starts pulling in CUDA we'll have another fun set of problems to solve... Which might necessitate the multi-platform lock anyways.
e
I'm not really sure. I just did the Pex work to make locking possible. Others added Pants support and I can't represent their logic. You should definitely ask around more widely.
b
I'm almost certain it's possible to do. Feel free to open an issue
g
It's a bit orthogonal to the specific question of multi-platform locks, though.
e
FWIW @gorgeous-winter-99296 a
===
may work for your non-local-version resolve instead of having to add the
!=
clauses. But sticking to what you came up with may be your best bet. I think either way takes a book to explain.
b
I agree @gorgeous-winter-99296, related but not the same. I think this more dedicated issue could address some of the larger one
g
👍 Sounds good, will try to do a write-up.