<@U04S45AHA>: re: <#12548>: i just realized that t...
# development
w
@enough-analyst-54434: re: #12548: i just realized that there is a facility that we didn’t discuss on friday, that at least superficially looks like it supports composing python artifacts the same way you would compose jars… PEX_PATH
e
I don't follow. What Pex are you adding using that?
Don't forget - python artifacts are not composable without an install step - unless they are eggs.
w
understood, and i haven’t been able to track down how the pex_path is consumed at PEX runtime (sorry, should have done that first).
but i’m essentially suggesting 1 PEX file per wheel, composed together in a graph aware way at runtime using the PEX_PATH
(i recognize that these would not be valid self-contained PEX files)
if the execution mode is
--venv
, that series of 1-wheel PEXes would essentially be used as a cache-key for the venv rather than actually being cracked open at all
e
This sounds super round about. If you just are worried about optimization, that probably easier to attack separately. I raised the concern given current Pex features, but lots of ways to solve.
w
it’s just optimization, yea. but it’s potentially interesting because it’s end to end… we avoid monolithic artifacts all the way from the initial resolve and up-to-and-including execution time. can’t do the same with wheels quite as efficiently, because the step to build a PEX from wheels doesn’t cache-hit into a venv directory the same way the
--venv
pex does.
e
You can't do that today. Pex could be taught to contain wheels instead of or in addition to installed wheels for one example. That said - the composition you suggest is equivalent to just adding an installed-wheel zip to sys.path.
Python already knows how to import an installed-wheel zip.
So for your scheme you basically need a
pex-tools <PEX> repository export --egg
.
w
You can’t do that today. Pex could be taught to contain wheels instead of or in addition to installed wheels for one example.
yea, sorry. i meant installed wheels.
That said - the composition you suggest is equivalent to just adding an installed-wheel zip to sys.path.
ok… i think that some of my confusion has been that “installed-wheel zip” is a mouthful relative to “jar”… but that does sound like what i’m looking for.
e
Yeah - that was the abandoned egg format.
w
got it.
e
The repository extract command could create the spiritual cousin which is the zipped up installed wheel chroot.
So, to summarize the impedance mismatch with JVM, jar == egg, egg is dead.
w
is it dead only as a distribution model on pypi? or are there good reasons to avoid it for this usecase too?
e
Its dead as a supported model. I think modern wheel has dropped its conversion support tool for example.
Pex's 1st two? years of life included support. That was the Wickman era though so he learned all about egg and why it was dropped but I did not.
I know there were ~PEP-hashed-through reasons but I don't know what they were.
Aha - its a one-way conversion now from egg to wheel in latest wheel:
Copy code
$ wheel -h
usage: wheel [-h] {unpack,pack,convert,version,help} ...

positional arguments:
  {unpack,pack,convert,version,help}
                        commands
    unpack              Unpack wheel
    pack                Repack wheel
    convert             Convert egg or wininst to wheel
...
w
interesting. and i suppose that i shouldn’t assume that what is true of the JVM (horrendously slow classpath loading from loose files) necessarily applies in Python land.
could be the opposite.
e
Could be - probably the same though since the slow thing is all about FSes and not about language most likely.
w
part of the appeal of the PEX_PATH approach was that it sidestepped actually changing how we invoke: we’re still using PEX to construct the virtualenv (just with different source data), and it can still hit its cache
e
Right.
I think - from scratch - I'd go with Pex supporting contiaining wheel files. That would also give you a nowmal (loose) PEX, but the new PEX machinery that supported wheel files would do all the same things it does today with subsetting, venvs etc.
The only cost is a bigger PEX runtime since it would need to include wheel installing code - which is currently housed canonically in Pip in the PEX buildtime codebase only.
w
hm
e
The overhead is then reduced to the time it takes to install a wheel file in the pex cache the 1st time a given wheel file is used, but not the 2nd, etc.
w
even across multiple venvs consuming that wheel?
e
Yes.
Pex does that today too.
It only extracts the pre-installed wheel once.
It lives in a hash dir
The venvs all hard-link the wheel in by default.
w
got it… that was the bit i was missing
e
That's the significance of the values in this map:
Copy code
$ pex-tools pants.pex info | jq .distributions
{
  "PyYAML-5.4.1-cp39-cp39-manylinux1_x86_64.whl": "bee146b7b338f215cf12e6c28c8ece8c798ec0e5",
  "ansicolors-1.1.8-py2.py3-none-any.whl": "c7b5d77e89855f9b02c99e84cd863c5bdc2be329",
  "certifi-2021.5.30-py2.py3-none-any.whl": "c1b63440cdd5c303c7268c98b3454996a456e0cd",
  "charset_normalizer-2.0.4-py3-none-any.whl": "d7b15d73088da1a67477fd8592d5352642d7a11a",
  "fasteners-0.16-py2.py3-none-any.whl": "ef97c8bf1c7ece677c203ea6c1a3580e4dbe0a34",
  "humbug-0.2.6-py3-none-any.whl": "e772468167e997e051e8c20d71551a11bba19f26",
  "idna-3.2-py3-none-any.whl": "f6f18646cd1fcc3b8d439bd113b7cc9ad4ebc3c0",
  "packaging-20.9-py2.py3-none-any.whl": "e0dc9f7afe5402ed634e303124285950c3bb1409",
  "pantsbuild.pants-2.6.0-cp39-cp39-manylinux2014_x86_64.whl": "ada4098a7fdbda946fae7ca683167b2a2c558787",
  "pex-2.1.42-py2.py3-none-any.whl": "d64dcfa09d02f9f2e483de2de86a8f90f326eca4",
  "psutil-5.8.0-cp39-cp39-manylinux2010_x86_64.whl": "5b9f49da190ee0271cddbc9395b9b74f8f28095b",
  "pyparsing-2.4.7-py2.py3-none-any.whl": "401ee8e4d4a08b2e87749a109db29991c85a67c6",
  "requests-2.26.0-py2.py3-none-any.whl": "fb70e1b8449a6408d983fb8c56f60f1e827890d8",
  "setproctitle-1.2.2-cp39-cp39-manylinux1_x86_64.whl": "d4f45d34e8477c16887ba446f339215a919687c7",
  "setuptools-56.2.0-py3-none-any.whl": "4c58b9c155902d9c63742e1862a7ef4ba4886751",
  "six-1.16.0-py2.py3-none-any.whl": "035d7c208925c1832def39b592f3477ca36397bf",
  "toml-0.10.2-py2.py3-none-any.whl": "941913d720ad4816a848c11a218c9110f1978120",
  "types_PyYAML-5.4.3-py2.py3-none-any.whl": "ff59bd9da4c781186fecb96e9717c458a98db032",
  "types_setuptools-57.0.0-py3-none-any.whl": "376ce3ec246abbf148b0edc6861c92f368daf4a8",
  "types_toml-0.1.3-py2.py3-none-any.whl": "8599aba9d6d109a743e579345fb4b23b4957c9d7",
  "typing_extensions-3.7.4.3-py3-none-any.whl": "eedab11cf76ead911adc7311c50c4f75212a0fa4",
  "urllib3-1.26.6-py2.py3-none-any.whl": "d58fa3fcc249a15487bc44b2a9afd9b065720684"
}
I think its probably horribly confusing that the keys look like wheel file names when they're PEX zip
.deps/
sub-directory names. It just made matching wheel tags straightforward at runtime resolution.
w
yea, definitely a little bit. i always assumed that they were “just” unzipped, as opposed to installed.
e
Yeah. There was alot of noise and error when this first when in, Mark Chu-Carrol learned all this the hard way but that is lost to history.
w
hey, so i’ve come back around to this idea, based on the outcome of the last two comments on https://github.com/pantsbuild/pants/issues/12548#issuecomment-902134524
essentially: i do actually need to decompose the “requirements.pex” to avoid putting a bunch of monolithic PEXes in the cache
i’ve parsed out the
pex-tools .. graph
and
pex-tools .. repository info
to get the dependency graph, and i’m thinking of building single entry PEXes (essentially, eggs), and then composing them with the PEX_PATH
we talked briefly about it at the top of this thread: but i’m wondering if you see any issues with it before i dive in further
e
Its a decent hack but if this proves effective, we should grow Pex a loose wheel mode 1st class instead of composition via PEX_PATH since Pex already has the graph and does the walk, etc.
👍 1
One sign of the hack that jumps out is you miss the cache on a Pex upgrade since the PEX .bootstrap/ code will almost certainly have changed. That code is just along for the ride in a PEX_PATH'd PEX, but it will force a miss on the actual cared about thing, the single installed wheel inside.
Happy to help with that if your experiment looks good.
w
yea, sounds good: thanks!
so, i think that i’ve hit a medium sized blocker with this: it does not appear that the PEX_PATH is consumed in all cases: https://github.com/pantsbuild/pex/issues/1423
i can work around this by having each requirement pex actually contain its transitive deps, but that’s fairly redundant: 50MB of wheels become 105MB of intransitive pexes, or 244MB of transitive pexes
h
that's the status quo, isn't it?
w
putting this down for tonight, but will probably persue the transitive angle tomorrow… it’s much easier anyway
that’s the status quo, isn’t it?
no: the status quo with subsetting is N different subsets, each of which might be up to 50MB (in this example)… for large N, that might be
N * 50MB >= 4GB
the 244MB is the total size of all permutations