When pex is fetching requirements, does it use `pi...
# general
h
When pex is fetching requirements, does it use
pip install
or does it do manual downloads with the file urls listed in the lockfile?
e
It uses a single
pip download
to create a lockfile (perform the resolve) in the 1st place. When later resolving from a lockfile on a new machine or with a cold cache it uses parallel
pip download --no-deps
for each non-
file://
URL needed.
The use of
pip download
for the individual URLs was only added later to make auth consistent between resolve (lock) and later retrieval. The initial cut used urllib directly.
h
Thanks for the info. I had tried playing around with proxying our dependencies through a private repository. That repository caches dependencies for us whenever someone with proper permissions runs a
pip install
. I would assume it also caches if one does
pip download
, but I'm not sure.
e
Yeah, Pex never does an internet-connected
pip install
. It uses
pip download
to resolve,
pip wheel
to turn any resolved sdists into wheels, and then
pip install --no-deps
on those wheels individually in isolated chroots each to form PYTHONPATH elements that can be composed later when a PEX file boots up.
h
pip install
is just sending HTTP requests to the repository though, right? Unless the proxy is interpreting the request headers or some other shenaningans, why would
pip install
do stuff that other HTTP requests for the same URLs wouldn't?
e
Well, because of pip.conf, Pip might be sending a bunch of custom headers Pex or other fetchers would not be clued in on. Pex in particular runs
pip --isolated ...
in all cases; so can't see that config if present.
That would be a sane difference. Agreed that if some proxy specifically does different things for a
pip
request - maybe by user agent snooping? - that would be a bit crazy.
But Pex uses Pip here unfettered; so the UA is Pips; so the only difference I can see is the pip.conf isolation.
h
Yeah, we agree that the only way to do pip-specific stuff is to inspect the headers, and it seems that this proxy is doing so. I'm just confused as to why it would be implemented thus...
e
Well, but
pip download
sends no different headers from
pip install
. This is why the only thing I can think of is removal of
pip.conf
- which Pex does - is the difference here. That removes auth headers of some sort is my guess.
h
I need to do some more experimentation on what works and all. I would hope pip download would work the same. Just all their docs demonstrate using pip install since that’s what folks commonly use.
e
@high-yak-85899 I really do think there is no way to detect, from the server side, the difference between `install`and `download`and I encourage you to look at security considerations /
pip.conf
if you haven't already. That sort of thing will require different plumbing than `pip.conf`to work with Pex through pip as things stand today.
h
Oh yeah I’m not interested in treating them differently. I just want to make sure the tool we are using isn’t doing something fishy. When I plumbed up credentials and a private registry in pants configuration, things didn’t work out of the box. So I need to chase down a combo of me potentially doing something wrong and the expectations of the proxy service.
e
I just want to make sure the tool we are using isn’t doing something fishy.
Yeah - my weak claim is it can't be. A tcpdump will not reveal the difference between an
install
and a
download
.
h
Is lock file generation done with pip download or are package needs gathered by some other mechanism?
e
Lock generation uses
pip download
.
... and - this may be getting there - a PyPI API in certain cases. Just a sec ...
PEP-691: https://peps.python.org/pep-0691/ Code here: https://github.com/pantsbuild/pex/blob/93e904ade594654da4455fd61dacb968d56d5fd8/pex/resolve/pep_691/api.py Basically, if the
pip download
logs indicate PEP-691 was used by Pip to fetch metadata, then Pex must do the same in a post-processing step to get sdist wheel hashes without downloading them all and hashing them: https://github.com/pantsbuild/pex/blob/93e904ade594654da4455fd61dacb968d56d5fd8/pex/resolve/locker.py#L409-L421 Before PEP-691 / with older Pip, the hashes were listed in the
pip download
log as adornments to the download URLs. Now, though, Pip no longer logs the fetched data; thus the need to post-process call the API ourselves. So I expect those calls are missing auth info in some way or your service doesn't like the Pex UA, which is
pex/2.1.133
(for example) here: https://github.com/pantsbuild/pex/blob/8c14e563493b8f4a85b4a3c735596dd55586a52b/pex/fetcher.py#L32-L33
But, to be clear @high-yak-85899, this is only the case for creating a lock file and nowhere else in Pex code. Just using a lock file won't trigger this PEP-691 API use since hashes are all known and written down in the lock file being read.
h
pip download
does work with the proxy service as you expected
Figured out that I was doing env var expansion wrong
I was going
$ENV_VAR
like the example in https://www.pantsbuild.org/docs/python-third-party-dependencies#authenticating-to-custom-repos but needed to be doing
$(env.ENV_VAR)s
A very odd thing, though. Almost completed and then I got this error
Copy code
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    apache-flink-libraries<1.16.2,>=1.16.1 from <https://artifactory.corp.astranis.space/artifactory/api/pypi/general-pypi-remote/packages/packages/13/e4/0d9082016be1983ff28bebb771f0c6859ddcffb850015b37f758fc2eea1e/apache-flink-libraries-1.16.1.tar.gz#sha256=f35f5bf0fe903c2e2f9f9617a5b8f4a9ed98ed5f12ad6d00b1e2341dfea55c11> (from apache-flink<1.17,>=1.16.1):
        Expected sha256 f35f5bf0fe903c2e2f9f9617a5b8f4a9ed98ed5f12ad6d00b1e2341dfea55c11
             Got        1906005054a583fd8685cb26496d3fce019632bdcae8b7022a31c4ab8dcca2f7
But when I check what's cached in my proxy, I see
f35f5bf0fe903c2e2f9f9617a5b8f4a9ed98ed5f12ad6d00b1e2341dfea55c11
correctly as the checksum
oddly enough a second run worked fine
h
cosmic rays? 😉