So <@U04S45AHA> I'm looking at adding support to P...
# pex
b
So @enough-analyst-54434 I'm looking at adding support to Pex to create a lockfile from an existing venv (or any way to produce a
pip inspect
JSON output) Do you see this as an extension of
pex3 lock create
or a new subcommand
pex3 lock pip-report
) I'm thinking in good UNIX style it should be something new and dedicated, that way it can be composed with the other commands if needed. And for what I can tell the Pex lockfile is just a subset of information in the report, so the code should be dead simple.
My hope/plan is super speedy lockfile creation by leveraging `pip`+PEP 658 support (to avoid downloads) with
pip install --dry-run --report
then feed that into Pex's lockfile creation so we get 0-download installs (where possible)
And for what I can tell the Pex lockfile is just a subset of information in the report, so the code should be dead simple.
Correction: The
locked_resolves
information seems to be a subset, the other metadata keys are not so easily populated
So I guess we could use the vendored
pip
, and split support into the two use-cases: •
pip install --dry-run --report <temppath> --quiet <args>
◦ This way Pex should have ~0 overhead over what
pip
would do. I'm sure there's thorns •
pip inspect --path <venv>
Really I think I'm most interested in #1, but #2 support follows, I think, once Pex can parse a report
Actually, now I'm wondering why Pex uses
pip download
and not
pip install --dry-run --report <file>
e
I'm driving today, but: + That functionality is newer than locks + That functionality is local to the venv (say CPython 3.9.6 exactly + Speeding up locking has already been experimentally vetted and issued for quite some time by seeding a venv then installing in that, etc. Have you reviewed and rejected that?
b
+ That functionality is newer than locks
I see that.
pip
23 is when it went stable. We should be able to detect that, I think.
That functionality is local to the venv (say CPython 3.9.6 exactly
In the case of
inspect
, yes. My current thinking is not supporting
inspect
and instead doing the
--dry-run
thing in Pex. That leaves the door open for
inspect
support later
Speeding up locking has already been experimentally vetted and issued for quite some time by seeding a venv then installing in that, etc. Have you reviewed and rejected that?
Yes and no? This is one step further, and possibly one step sideways? IIUC asking
pip
for a report would be semantically equivalent to making the venv, but without any time spent on actually installing. So, you get the benefits and then some?
With "and then some" being: • not actually spending the disk space installing (or even downloading in a PEP 658 world) • Converting from report JSON to Pex JSON is simple
I'm doing a PoC today. Hopefully I can get something here by end-of-day since it's a very long weekend.
OK, so not possible until the report contains the hash for vcs reqs, but I started a discussion for that
e
You're factually misinformed here. Pex installs nothing when locking.
b
Ok so here's my findings: • It's certainly possible, assuming VCS reqs get added • The benefit would be less code in this code path (but more code overall). We just throw the work over the fence • We benefit from newer features (like pep 658 support) without having to actually implement them ourselves • PyPI hasn't backfilled metadata support, so it isn't obviously faster • It's actually slower than the normal download code because PEX is highly parallel it seems, and pip isn't? I might be doing something obviously wrong
e
Pex already benefits from PEP 658 since it aggressively stays up2date with Pip.
b
Yes I'm very aware. PEX doesn't install, I'm comparing with the venv case
e
Ok, we can talk when you have real numbers.
b
PEP 658 AIUI would avoid the need for downloading at all so staying up to date doesn't help if we're still downloading
I have the ability to get numbers, but I'm not sure what parts of the cache to wipe to make them comparable. I'll try and get my branch pushed later
e
Sure, but its unclear how helpful that is vs the early cutoff a venv adds
If you weren't aware, Pip resolves differently (and faster) when it can look at a populated venv as baseline.
I ran the experiments. They are on the issue iirc
b
A venv requires download and install? I must be missing something obvious here. How would avoiding download and installation ever be slower than not?
e
A venv hits cache
b
We're talking lockfiles creation, right?
e
Then you resolve
Correct
b
If you have the issue link handy that'd be great. I'm absolutely befuddled how download+install could be faster than "just ask for the metadata from PyPI"
Especially for requirements like torch, which the download alone is >GB transfer
Oh and that reminds me of my last finding: • We could always just download the metadata if it exists so we don't bifurcate our code (and just never use --report)
I think to compare apples to apples with the new code, we'd need our own little cheese shop serving metadata files. I'll have to try out how easy that is with pip (and probably http.server)
Are you talking about https://github.com/pantsbuild/pex/issues/2044? Because that talks about lockfiles updating, or creating from an existing lockfile. Using --report with PEP 658 would help speed up from-scratch (and likely still beat from-lockfile, but that's less obvious)
OK https://github.com/thejcannon/pex/tree/jcannon/pip-report is the PoC branch. I think it pairs well with a PEP 658-compliant cheeseshop, of which PyPI is ready-but-not-yet-backfilled
e
I have a limited time between climbing trips; so I won't be reading the branch, but if you can come up with numbers that use the example pointed at by the existing issue I have that show some significant win in that case or in similar useful cases vs the mechanism I sketched, then it will be worth finding time to look / take on new code, etc.
b
Will do. My own guy says the code simplification might not be worth the new code path. The real jewel is swapping the pip download with something that just grabs the metadata if possible
Microbenchmark: First run uses existing codepath, second run uses
pip install --dry=run --report
Copy code
josh@cephandrius:~/work/pex$ rm -rf ~/.pex/pip_cache && rm -rf ~/.pex/fingerprints.db
josh@cephandrius:~/work/pex$ time PEX_VERBOSE=1 python -m pex.cli lock create --style strict --pip-version 23.2.dev0 --resolver-version pip-2020-resolver --intransitive requests 
pex: Resolving for:
pex: Hashing pex: 29.5ms                
pex: Isolating pex: 0.0ms
pex: Resolving for:
pex: Found 0 fingerprints cached in database.                                       
pex: Resolving for:
pex: Fetching PEP-691 index metadata from <https://pypi.org/simple/requests/> for application/vnd.pypi.simple.v1+json: 154.8ms
pex: Resolving for:
pex: Resolving for:8 :: Caching 1 fingerprints in database
  /usr/bin/python3.8: 909.2ms                             
pex:   Searching for 1 fingerprints in database: 8.2ms
pex:   Making 1 PEP-691 JSON API requests across 1 threads to fingerprint 1 artifacts: 165.4ms
pex:   Caching 1 fingerprints in database: 11.8ms
pex: Creating lock from resolve: 15.1ms                                                        
pex:   Building 0 source distributions to gather metadata for lock.: 8.5ms
pex: Using cached artifact at /home/josh/.pex/downloads/58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003f for FileArtifact(url='<https://files.pythonhosted.org/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl>', fingerprint=Fingerprint(algorithm='sha256', hash='58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003f'), verified=False, filename='requests-2.31.0-py3-none-any.whl')
pex: Indexing downloads: 0.3ms
{"allow_builds": true, "allow_prereleases": false, "allow_wheels": true, "build_isolation": true, "constraints": [], "locked_resolves": [{"locked_requirements": [{"artifacts": [{"algorithm": "sha256", "hash": "58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003f", "url": "<https://files.pythonhosted.org/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl>"}], "project_name": "requests", "requires_dists": ["PySocks!=1.5.7,>=1.5.6; extra == \"socks\"", "certifi>=2017.4.17", "chardet<6,>=3.0.2; extra == \"use_chardet_on_py3\"", "charset-normalizer<4,>=2", "idna<4,>=2.5", "urllib3<3,>=1.21.1"], "requires_python": ">=3.7", "version": "2.31.0"}], "platform_tag": ["cp38", "cp38", "manylinux_2_31_x86_64"]}], "path_mappings": {}, "pex_version": "2.1.137", "pip_version": "23.2.dev0", "prefer_older_binary": false, "requirements": ["requests"], "requires_python": [], "resolver_version": "pip-2020-resolver", "style": "strict", "target_systems": [], "transitive": false, "use_pep517": null}

real    0m1.114s
user    0m0.605s
sys     0m0.116s
josh@cephandrius:~/work/pex$ rm -rf ~/.pex/pip_cache && rm -rf ~/.pex/fingerprints.db
josh@cephandrius:~/work/pex$ time PEX_VERBOSE=1 python -m pex.cli lock create --style strict --pip-version 23.2.dev0 --resolver-version pip-2020-resolver --intransitive requests 
pex: Resolving for:
pex: Hashing pex: 30.3ms                
pex: Isolating pex: 0.0ms
pex: Resolving for:
  /usr/bin/python3.8: 756.4ms
{"allow_builds": true, "allow_prereleases": false, "allow_wheels": true, "build_isolation": true, "constraints": [], "locked_resolves": [{"locked_requirements": [{"artifacts": [{"algorithm": "sha256", "hash": "58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003f", "url": "<https://files.pythonhosted.org/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl>"}], "project_name": "requests", "requires_dists": ["PySocks!=1.5.7,>=1.5.6; extra == \"socks\"", "certifi>=2017.4.17", "chardet<6,>=3.0.2; extra == \"use_chardet_on_py3\"", "charset-normalizer<4,>=2", "idna<4,>=2.5", "urllib3<3,>=1.21.1"], "requires_python": null, "version": "2.31.0"}], "platform_tag": ["cp38", "cp38", "manylinux_2_31_x86_64"]}], "path_mappings": {}, "pex_version": "2.1.137", "pip_version": "23.2.dev0", "prefer_older_binary": false, "requirements": ["requests"], "requires_python": [], "resolver_version": "pip-2020-resolver", "style": "strict", "target_systems": [], "transitive": false, "use_pep517": null}

real    0m0.956s
user    0m0.584s
sys     0m0.114s
This is because
pip
of the latest version now supports the PEP 658 metadata, and PyPI currently is serving it for packages uploaded as of ~some date. So for
requests
`2.131.0`: https://files.pythonhosted.org/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl.metadata
So
pip
is seeing that the index supports PEP 658, and instead of downloading
requests
for the metadata, just downloads the metadata directly. They plan on backfilling at some point, but haven't yet, so it's a microbenchmark with one package
e
Yeah - this needs a better test since Pex will use Pip-latest as it does here already: https://github.com/pantsbuild/pex/pull/2168 So, for a
--style universal
lock, which might visit 10k nodes that all have the new metadata, Pex will only actually download the final solution set. So you save on that final download, but only that. You also need to backfill all the missing hashes using the dry run (universal needs to include artifacts beyond those for the current venv) and then there is the technical point to confirm whether the runtime patching used to get the universal resolve resukt correct in the 1st place works with inspect. It may need to be adapted.
I welcome your thorough investigation of all those corners / details.
Another thing to keep in mind, is - say the savings on the final solution set download is big - what are the cases when you won't need to download that set later? Are they important or common? Most users will actually need the artifacts!
And today they hit from cache.
All of this is to say obviously not downloading is a win, but with all there is to do, is it important enough to spend time on / complexify code with, etc. That's the meat of the thing to answer.
b
That's a very weak argument. That locking now is great because the same machine might use the dep later. I lock in a docker image. Whole GHA bots are dedicated to re-locking. C'mon
e
Josh, I'm done with you. I'm not arguing, I'm telling you the real matrix of things in play here.
b
I'm happy to bow out. I get the sense you're resistant to change this to begin with
It's not prohibitively hard to just
jq
myself into a Pex lockfile and reap the benefits for myself
e
I'm only resistant to slapdash. I have unfortunately convinced myself you're moving fast and breaking things and not putting in the time. I lack trust is all.
If all the details are sorted and it works and is solid, I'm happy.
b
Well that all has manifested as a perception of being resistant to change, and requiring passage through the gauntlet, instead of collaboration. So, I'm happy to bow out now, and let this get picked up by whoever later