https://pantsbuild.org/ logo
w

witty-crayon-22786

08/16/2021, 8:20 PM
@enough-analyst-54434: re: #12548: i just realized that there is a facility that we didn’t discuss on friday, that at least superficially looks like it supports composing python artifacts the same way you would compose jars… PEX_PATH
e

enough-analyst-54434

08/16/2021, 8:21 PM
I don't follow. What Pex are you adding using that?
Don't forget - python artifacts are not composable without an install step - unless they are eggs.
w

witty-crayon-22786

08/16/2021, 8:24 PM
understood, and i haven’t been able to track down how the pex_path is consumed at PEX runtime (sorry, should have done that first).
but i’m essentially suggesting 1 PEX file per wheel, composed together in a graph aware way at runtime using the PEX_PATH
(i recognize that these would not be valid self-contained PEX files)
if the execution mode is
--venv
, that series of 1-wheel PEXes would essentially be used as a cache-key for the venv rather than actually being cracked open at all
e

enough-analyst-54434

08/16/2021, 8:39 PM
This sounds super round about. If you just are worried about optimization, that probably easier to attack separately. I raised the concern given current Pex features, but lots of ways to solve.
w

witty-crayon-22786

08/16/2021, 8:44 PM
it’s just optimization, yea. but it’s potentially interesting because it’s end to end… we avoid monolithic artifacts all the way from the initial resolve and up-to-and-including execution time. can’t do the same with wheels quite as efficiently, because the step to build a PEX from wheels doesn’t cache-hit into a venv directory the same way the
--venv
pex does.
e

enough-analyst-54434

08/16/2021, 9:34 PM
You can't do that today. Pex could be taught to contain wheels instead of or in addition to installed wheels for one example. That said - the composition you suggest is equivalent to just adding an installed-wheel zip to sys.path.
Python already knows how to import an installed-wheel zip.
So for your scheme you basically need a
pex-tools <PEX> repository export --egg
.
w

witty-crayon-22786

08/16/2021, 9:37 PM
You can’t do that today. Pex could be taught to contain wheels instead of or in addition to installed wheels for one example.
yea, sorry. i meant installed wheels.
That said - the composition you suggest is equivalent to just adding an installed-wheel zip to sys.path.
ok… i think that some of my confusion has been that “installed-wheel zip” is a mouthful relative to “jar”… but that does sound like what i’m looking for.
e

enough-analyst-54434

08/16/2021, 9:37 PM
Yeah - that was the abandoned egg format.
w

witty-crayon-22786

08/16/2021, 9:37 PM
got it.
e

enough-analyst-54434

08/16/2021, 9:37 PM
The repository extract command could create the spiritual cousin which is the zipped up installed wheel chroot.
So, to summarize the impedance mismatch with JVM, jar == egg, egg is dead.
w

witty-crayon-22786

08/16/2021, 9:39 PM
is it dead only as a distribution model on pypi? or are there good reasons to avoid it for this usecase too?
e

enough-analyst-54434

08/16/2021, 9:40 PM
Its dead as a supported model. I think modern wheel has dropped its conversion support tool for example.
Pex's 1st two? years of life included support. That was the Wickman era though so he learned all about egg and why it was dropped but I did not.
I know there were ~PEP-hashed-through reasons but I don't know what they were.
Aha - its a one-way conversion now from egg to wheel in latest wheel:
Copy code
$ wheel -h
usage: wheel [-h] {unpack,pack,convert,version,help} ...

positional arguments:
  {unpack,pack,convert,version,help}
                        commands
    unpack              Unpack wheel
    pack                Repack wheel
    convert             Convert egg or wininst to wheel
...
w

witty-crayon-22786

08/16/2021, 9:44 PM
interesting. and i suppose that i shouldn’t assume that what is true of the JVM (horrendously slow classpath loading from loose files) necessarily applies in Python land.
could be the opposite.
e

enough-analyst-54434

08/16/2021, 9:45 PM
Could be - probably the same though since the slow thing is all about FSes and not about language most likely.
w

witty-crayon-22786

08/16/2021, 9:46 PM
part of the appeal of the PEX_PATH approach was that it sidestepped actually changing how we invoke: we’re still using PEX to construct the virtualenv (just with different source data), and it can still hit its cache
e

enough-analyst-54434

08/16/2021, 9:47 PM
Right.
I think - from scratch - I'd go with Pex supporting contiaining wheel files. That would also give you a nowmal (loose) PEX, but the new PEX machinery that supported wheel files would do all the same things it does today with subsetting, venvs etc.
The only cost is a bigger PEX runtime since it would need to include wheel installing code - which is currently housed canonically in Pip in the PEX buildtime codebase only.
w

witty-crayon-22786

08/16/2021, 9:51 PM
hm
e

enough-analyst-54434

08/16/2021, 9:51 PM
The overhead is then reduced to the time it takes to install a wheel file in the pex cache the 1st time a given wheel file is used, but not the 2nd, etc.
w

witty-crayon-22786

08/16/2021, 9:53 PM
even across multiple venvs consuming that wheel?
e

enough-analyst-54434

08/16/2021, 10:07 PM
Yes.
Pex does that today too.
It only extracts the pre-installed wheel once.
It lives in a hash dir
The venvs all hard-link the wheel in by default.
w

witty-crayon-22786

08/16/2021, 10:08 PM
got it… that was the bit i was missing
e

enough-analyst-54434

08/16/2021, 10:10 PM
That's the significance of the values in this map:
Copy code
$ pex-tools pants.pex info | jq .distributions
{
  "PyYAML-5.4.1-cp39-cp39-manylinux1_x86_64.whl": "bee146b7b338f215cf12e6c28c8ece8c798ec0e5",
  "ansicolors-1.1.8-py2.py3-none-any.whl": "c7b5d77e89855f9b02c99e84cd863c5bdc2be329",
  "certifi-2021.5.30-py2.py3-none-any.whl": "c1b63440cdd5c303c7268c98b3454996a456e0cd",
  "charset_normalizer-2.0.4-py3-none-any.whl": "d7b15d73088da1a67477fd8592d5352642d7a11a",
  "fasteners-0.16-py2.py3-none-any.whl": "ef97c8bf1c7ece677c203ea6c1a3580e4dbe0a34",
  "humbug-0.2.6-py3-none-any.whl": "e772468167e997e051e8c20d71551a11bba19f26",
  "idna-3.2-py3-none-any.whl": "f6f18646cd1fcc3b8d439bd113b7cc9ad4ebc3c0",
  "packaging-20.9-py2.py3-none-any.whl": "e0dc9f7afe5402ed634e303124285950c3bb1409",
  "pantsbuild.pants-2.6.0-cp39-cp39-manylinux2014_x86_64.whl": "ada4098a7fdbda946fae7ca683167b2a2c558787",
  "pex-2.1.42-py2.py3-none-any.whl": "d64dcfa09d02f9f2e483de2de86a8f90f326eca4",
  "psutil-5.8.0-cp39-cp39-manylinux2010_x86_64.whl": "5b9f49da190ee0271cddbc9395b9b74f8f28095b",
  "pyparsing-2.4.7-py2.py3-none-any.whl": "401ee8e4d4a08b2e87749a109db29991c85a67c6",
  "requests-2.26.0-py2.py3-none-any.whl": "fb70e1b8449a6408d983fb8c56f60f1e827890d8",
  "setproctitle-1.2.2-cp39-cp39-manylinux1_x86_64.whl": "d4f45d34e8477c16887ba446f339215a919687c7",
  "setuptools-56.2.0-py3-none-any.whl": "4c58b9c155902d9c63742e1862a7ef4ba4886751",
  "six-1.16.0-py2.py3-none-any.whl": "035d7c208925c1832def39b592f3477ca36397bf",
  "toml-0.10.2-py2.py3-none-any.whl": "941913d720ad4816a848c11a218c9110f1978120",
  "types_PyYAML-5.4.3-py2.py3-none-any.whl": "ff59bd9da4c781186fecb96e9717c458a98db032",
  "types_setuptools-57.0.0-py3-none-any.whl": "376ce3ec246abbf148b0edc6861c92f368daf4a8",
  "types_toml-0.1.3-py2.py3-none-any.whl": "8599aba9d6d109a743e579345fb4b23b4957c9d7",
  "typing_extensions-3.7.4.3-py3-none-any.whl": "eedab11cf76ead911adc7311c50c4f75212a0fa4",
  "urllib3-1.26.6-py2.py3-none-any.whl": "d58fa3fcc249a15487bc44b2a9afd9b065720684"
}
I think its probably horribly confusing that the keys look like wheel file names when they're PEX zip
.deps/
sub-directory names. It just made matching wheel tags straightforward at runtime resolution.
w

witty-crayon-22786

08/16/2021, 10:12 PM
yea, definitely a little bit. i always assumed that they were “just” unzipped, as opposed to installed.
e

enough-analyst-54434

08/16/2021, 10:13 PM
Yeah. There was alot of noise and error when this first when in, Mark Chu-Carrol learned all this the hard way but that is lost to history.
w

witty-crayon-22786

08/24/2021, 4:37 PM
hey, so i’ve come back around to this idea, based on the outcome of the last two comments on https://github.com/pantsbuild/pants/issues/12548#issuecomment-902134524
essentially: i do actually need to decompose the “requirements.pex” to avoid putting a bunch of monolithic PEXes in the cache
i’ve parsed out the
pex-tools .. graph
and
pex-tools .. repository info
to get the dependency graph, and i’m thinking of building single entry PEXes (essentially, eggs), and then composing them with the PEX_PATH
we talked briefly about it at the top of this thread: but i’m wondering if you see any issues with it before i dive in further
e

enough-analyst-54434

08/24/2021, 8:18 PM
Its a decent hack but if this proves effective, we should grow Pex a loose wheel mode 1st class instead of composition via PEX_PATH since Pex already has the graph and does the walk, etc.
👍 1
One sign of the hack that jumps out is you miss the cache on a Pex upgrade since the PEX .bootstrap/ code will almost certainly have changed. That code is just along for the ride in a PEX_PATH'd PEX, but it will force a miss on the actual cared about thing, the single installed wheel inside.
Happy to help with that if your experiment looks good.
w

witty-crayon-22786

08/24/2021, 8:23 PM
yea, sounds good: thanks!
so, i think that i’ve hit a medium sized blocker with this: it does not appear that the PEX_PATH is consumed in all cases: https://github.com/pantsbuild/pex/issues/1423
i can work around this by having each requirement pex actually contain its transitive deps, but that’s fairly redundant: 50MB of wheels become 105MB of intransitive pexes, or 244MB of transitive pexes
h

hundreds-father-404

08/27/2021, 2:43 AM
that's the status quo, isn't it?
w

witty-crayon-22786

08/27/2021, 2:43 AM
putting this down for tonight, but will probably persue the transitive angle tomorrow… it’s much easier anyway
that’s the status quo, isn’t it?
no: the status quo with subsetting is N different subsets, each of which might be up to 50MB (in this example)… for large N, that might be
N * 50MB >= 4GB
the 244MB is the total size of all permutations