https://pantsbuild.org/ logo
#general
Title
# general
s

straight-action-80318

07/05/2022, 3:05 PM
What is best practice for adding PyTorch to my requirements in Pants? Right now in order to get the latest CUDA build I have to include
Copy code
[python-repos]
indexes.add = ["<https://download.pytorch.org/whl/cu116>"]
in my
pants.toml
file, but this makes my
generate-lockfiles
time much much longer (probably 2-3x). I can’t figure out if there’s a way to only include this repo for the single
pytorch
requirement.
h

hundreds-father-404

07/05/2022, 3:08 PM
I can’t figure out if there’s a way to only include this repo for the single pytorch requirement.
There is not
but this makes my generate-lockfiles time much much longer (probably 2-3x)
Hm, is it using a prebuilt wheel, or now you're building the wheel from an sdist? If the latter, that would explain the slowdown
b

bitter-ability-32190

07/05/2022, 3:10 PM
The CUDA-based
pytorch
is HUGE FWIW, so I'm not terribly surprised. The crux of the issue lies in Python packaging. In Python there is no way to get the metadata for a package without downloading the entire package. For packages like
pytorch
that forces lockfile generation to download the entire thing just to parse a simple textfile 😞
1
s

straight-action-80318

07/05/2022, 3:14 PM
is there no way to pass
--extra-index-url <https://download.pytorch.org/whl/cu116>
to the requirements file?
normally you can add the
-i
flag but it doesn’t seem to work for Pants
h

hundreds-father-404

07/05/2022, 3:17 PM
Indeed, and the way Pex/pip is invoked, it would not work because we declare
-i
for the whole lockfile generation process. But I'm not convinced adding the extra index is specifically slowing down resolution of non-Pytorch requirements. To test that, you could have a simple
[python].resolves
w/ only requirements like
ansicolors
and
requests
in the resolve. Leave out Pytorch. Then time how long generating that lockfile takes w/ the index and w/o.
s

straight-action-80318

07/05/2022, 3:42 PM
I see, so you think it’s just the fact that it’s parsing the package or something? PyTorch installs an old version if I don’t specify the index file, and it’s still fast. I setup a fresh repo for some testing, this is what I got: No PyTorch, no extra index: 3.523s No PyTorch, yes extra index: 6.323s Yes PyTorch, no extra index: 5.122s Yes PyTorch, yes extra index: 412.619s
2
so it’s actually like 100x longer 😞
So you’re saying that it has to download the entire whl and parse it which is what takes so long? Is that info cached normally so when I don’t include the index it’s fast?
h

hundreds-father-404

07/05/2022, 3:44 PM
those numbers w/ no PyTorch look reasonable to me. They suggest to me that the issue is specifically related to how Pytorch is getting resolved from the new index; it's not about using more indexes in general
It would probably be informative to use Pex directly and cut out Pants. See https://pex.readthedocs.io/en/v2.1.94/. You can run Pants with
-ldebug generate-lockfiles
to see the argv that Pants uses when running Pex. Or recreate it yourself with something like
pex3 -vvv lock create pytorch -i ...
1
b

bitter-ability-32190

07/05/2022, 3:45 PM
Not the parsing, likely the downloading of the GPU-compiled package. (1.8GB) NOTE that without the extra index, you get vanilla CPU-bound package which is smaller. ~180M
👍 1
s

straight-action-80318

07/05/2022, 3:52 PM
ohh I see, I didn’t realize the CPU package was so much smaller, hmmmm
maybe one fix is to simply cache the GPU package on my own network or machine and point the index at that
e

enough-analyst-54434

07/05/2022, 4:00 PM
is there no way to pass
--extra-index-url <https://download.pytorch.org/whl/cu116>
to the requirements file?
FWIW, even if Pants allowed you to do this, it wouldn't help - Pip uses the extra index globally and not just for the requirement it is next to in a requirements file. See: https://pip.pypa.io/en/stable/reference/requirements-file-format/#global-options
1
2
s

straight-action-80318

07/05/2022, 4:01 PM
I was able to create a package index locally and just point at
file:///path/to/index
❤️ 1
looks like this
this takes about 20 seconds
🚀 1
still longer but much faster than downloading
b

bitter-ability-32190

07/05/2022, 4:02 PM
Sometimes I muse if I had the AWS budget or time, making a metadata-only PyPI mirror. I suppose in this case it'd need to mirror other indexes as well
🙌 1
e

enough-analyst-54434

07/05/2022, 4:03 PM
That's a musing I share, but you'd need a PEP to do this right and the AWS budget would be a donation to the PyPA. I think it would also take some long term dedication - the payoff would be 5 or more years out if the push for egg -> whl is any indicator.
❤️ 1
s

straight-action-80318

07/05/2022, 4:04 PM
so what is the additional 12ish seconds of overhead, just parsing the package itself? does it attempt to copy the package from the filesystem to another location?
e

enough-analyst-54434

07/05/2022, 4:06 PM
It does do a copy, yes.
There may be two copies, one by Pip and one by Pex - I'd need to review some code in each to be sure.
Circling back to extra indexes: I think each index added scales the resolve time linearly (wrong wording, but 2 indexes = 2x as long, 3 - 3x, etc.) if each added index is ~a mirror of PyPI. You can eliminate use of a full extra index by using the URLS of fixed wheels as requirements, possibly tacking a
; ...
environment marker on the end to make sure each wheel is only downloaded for the appropriate python versions and platform. I'm not sure we've had anyone use that trick.
s

straight-action-80318

07/05/2022, 4:14 PM
hmmmm I see, okay I don’t wanna hardcode paths just yet, but this basically fixes things I can wait another 12 seconds
thanks for the help!
8 Views