thinking about whether a metadata-only resolve cou...
# pex
a
thinking about whether a metadata-only resolve could be possible (idea from @rough-minister-58256) to download massive wheels with great parallelism by somehow sending an http byte range header in the request to download the zip. i found that most wheels will have the METADATA file at the bottom of the zip, and was thinking that it could be possible to extract the dependencies with a very small range request (can use http HEAD to get the size of the whole zip beforehand), then start downloading those wheels in parallel recursively so we can saturate the network interface. i looked through the zip file spec and i’m trying to see if the content of a zip file is specified with purely local information so i can just request the last however many bytes of a wheel, parse and find the METADATA file content, extract it, and read out the requirements. currently just trying to follow along with emacs in hexl-mode
e
You may be able to hack this together but getting PyPA / pypi to buy in on formally splitting out (dependency) metadata from other artifacts would have much wider benefit.
a
the path to doing that is fully unclear to me except by way of hacking something together and upstreaming the changes to pip, motivating a subsequent PEP to formalize the notion, or something
e
The path would start with a PEP
a
ok
i can read up on how to do that
ok, i'm at https://github.com/python/peps/blob/master/CONTRIBUTING.rst now and i'm clear on the next steps but also still want to waste some time on this today because it may again be relevant to the other thing in the back of my head that i mentioned earlier in this channel where i want to ship this open source highly parallel scala compiler since that handles jars which are also zips
but the plan in my head is: 1. waste some time on that today 2. search/propose on python-ideas
e
Mildly related, I'll submit https://github.com/pantsbuild/pants/pull/8704 on green by end of day but would love your review if you have time. It includes a good deal of change in unpack_wheels.
a
will do!
i had already started this last night but didn't get very far -- thank you for doing this
got my horrifying wish:
Copy code
curl -L "<https://files.pythonhosted.org/packages/90/77/15d6ebee3fd7ad53581ef9aacb680d6dadadf63c964c8e482547e1a5b493/pantsbuild.pants-1.23.0rc0-cp36-abi3-manylinux1_x86_64.whl#sha256=5eb64ef8f1a7ef91d685c4f8895a548970a3db1c1321a3548b773d3a7986eff1>" > 'pantsbuild.pants-1.23.0rc0-cp36-abi3-manylinux1_x86_64.whl' && python try.py 'pantsbuild.pants-1.23.0rc0-cp36-abi3-manylinux1_x86_64.whl'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 35.5M  100 35.5M    0     0  12.6M      0  0:00:02  0:00:02 --:--:-- 12.6M
all_requirements from pantsbuild.pants-1.23.0rc0-cp36-abi3-manylinux1_x86_64.whl were:
twitter.common.collections<0.4,>=0.3.11
setproctitle==1.1.10
ansicolors==1.0.2
typing-extensions==3.7.4
dataclasses==0.6
packaging==16.8
psutil==5.6.3
contextlib2==0.5.5
fasteners==0.14.1
python-Levenshtein==0.12.0
PyYAML==5.1.2
setuptools==40.6.3
twitter.common.dirutil<0.4,>=0.3.11
pathspec==0.5.9
asttokens==1.1.13
requests[security]>=2.20.1
cffi==1.13.2
pex==1.6.12
pyopenssl==17.3.0
pystache==0.5.3
py-zipkin==0.18.4
www-authenticate==0.9.2
docutils==0.14
Markdown==2.1.1
Pygments==2.3.1
twitter.common.confluence<0.4,>=0.3.11
wheel==0.31.1
pywatchman==1.4.1
where
try.py
does the hacky nonsense i mentioned
now that that's settled