Hi folks! Is there documentation of the pants lock...
# general
f
Hi folks! Is there documentation of the pants lockfile format?
e
Nope. It comes from Pex and is purposely opaque / undocumented currently. What's your use case?
f
We need to generate a software bill of materials (SBOM) to satisfy regulatory requirements, so we’d like to write software that will consume the python lockfile as well as lock files from other languages to produce a combined SBOM.
e
Gotcha. For now you just have to consume as is and hope it doesn't break. Pants itself does this to print out pretty diffs when you run
pants generate-lockfiles
.
Is there some SBOM standard? I'd feel much more comfortable adding a feature to Pex to emit that than to defining and committing to its own lock file format standard.
SBOM seems to scream standard.
f
There is an open standard, SPDX. https://spdx.github.io/spdx-spec/v2.3/
e
Ok, I just found two others as well - my god. Taking SPDX though, if I provide a
pex3 lock export
(an existing command)
--format
for that would that suffice?
f
That would be amazing!
Yes, unfortunately there are a lot of standards. SPDX seems to be the one that is widely used and supported by the linux foundation.
e
Ok. Since you know something about this would you be willing to file a Pex issue? I need to page in that spec to see how much work this entails, but clearly it's ~exactly the amount of work you are preparing to do anyway.
f
Absolutely! Happy to.
And if I can, I’d be happy to contribute to the feature. I would need to learn how.
e
Ok, that would be great. I'll use the issue to seed some notes on where this goes and where the underlying lock data model is, etc.
This is Pex for completeness: https://github.com/pantsbuild/pex
f
c
thanks for bringing this to our attention @few-arm-93065! /me follows this 🙂
e
The checksum thing is pretty bad. Sometimes there are 10s of files per locked project and 100s of locked projects in 1 lock file. Pulling down ~1000 wheels just to re-fingerprint down from sha256 to sha1 is pretty horrible. Maybe I read the spec wrong?
f
I’m just diving into this spec as well. Frustrating. It looks like allowing people to use any checksum function has been suggested, and may make it into the v3 version of the standard. https://github.com/spdx/spdx-spec/issues/106
e
Ok. I mean, that seems like a blocker to me. Poor PyPI for one.
It certainly could be done, just more than a bit crazy since it gives you less surety in your SBOM!
So, @few-arm-93065 come hell or high water though you must produce SPDX across software ecosystems anyhow?
Presumably this is a problem elsewhere too?
f
No worries, we can always consume the pants/pex lockfile in the interim
e
So you will be spamming PyPI?
f
Internally, we actually don’t care about perfect adherence to SPDX. We’ll be producing a human-readable regulatory document. So in the short term we can make a somewhat hacky attempt at it.
e
Aha.
Yeah, I'd definitely want to emit a standard if it is to be a Pex feature.
f
I wonder if there’s a more lightweight standard we could export locks into. CycloneDX https://cyclonedx.org is another one that seems to offer a lot more flexibility.
e
That was one of the other two I found. I'll take a look. Back on SPDX, there is https://github.com/pantsbuild/pex/blob/main/pex/resolve/pep_691/fingerprint_service.py which Pex uses. I'm not sure what actual hashes PyPI tends to present. I'll check that now.
The thing is, that's ~only supported for PyPI. If using custom indexes in addition, downloading all the things from those indexes will still be needed.
f
my knowledge of custom indexes isn’t great, are they guaranteed to provide at least one kind of hash, even if it’s not sha1 or sha256?
ah, the code you linked to answers my question. If that were true pex wouldn’t need to download packages to fingerprint them…
e
They are not. Pex falls back to downloading and hashing.
Ok, tried out PyPI PEP-691 on a project and it only returns sha256 hashes.
Copy code
$ curl -sSL -H "Accept: application/vnd.pypi.simple.v1+json" <https://pypi.org/simple/p537> | jq .
f
And that’s just an implementation detail of pypi... I’m curious what you think of cyclonedx. It seems to allow for many kinds of fingerprint.
e
I can't even see it requiring 1. Components seems to be 0 or more and same hashes: https://cyclonedx.org/docs/1.4/json/#components_items_hashes
This spec is easier to read, if looser. And that draws my attention to licenses. Presumably you need those? (they are also 0 or more).
For that, Pex will again need to download the file to extract the license. You could maybe cheat and just download 1 and assume the license is the same in each wheel variant (and sdist) published for that version.
f
yes, licenses are going to be important. Doesn’t the pypi JSON API have a license field?
e
Not at that endpoint.
f
I see “license” under the “info” object in the main /pypi/project/json endpoint, I assume pex does hit that as well?
e
Pex does not, no.
f
how about classifiers?
(that’s a long shot admittedly)
e
Yeah - none of that. Ok, here: https://warehouse.pypa.io/api-reference/json.html I thought that API was deprecated, occasionally blacked out, etc though. Let me check.
OK, I guess not - looks legit. So that would be the primary way to get the extra SBOM data, with, again, fallback to downloading artifacts and cracking them open.
f
Even a “lazy” approach would be ok with me - if the authors didn’t set the license in pypi, just don’t include anything in the BOM, indicating we made an attempt but manual investigation is needed
(I assume we’ll be chasing down things like this regardless)
e
If you can find a lazy enough spec then I'm happy to support that directly in Pex. That's crucial though for Pex support.
I don't want Pex to emit invalid spec X.
f
of course - and understood. Cyclonedx does seem to support an array of 0 or more licenses. I’ll need to do some more detailed research on this.
e
Ok, in the meantime I'll add the code pointers to the issue in case this ends up being feasible as a Pex feature pending your spec investigation.
f
thank you!
e
You're welcome. One last question from my end - ignorant of SBOMs - is intended that an SBOM actually contains unused software? Pex
--style universal
lock files - which is what Pants uses - lock the artifacts needed to form a PEX across Python versions and target systems (Linux & Mac). Presumably though you only actually build software for some of those. IOW you produce a PEX file that just contains a sub-slice of the lock file and you never actually ship or use - say - 90% of the artifacts in the lock. Is this as intended?
f
ah, I didn’t realize that. That is not intended. We use a single platform for all the PEXs we ship (devs on macs but building docker to x86 linux). The regulators only care about the packages that are in the actual product.
e
Right. And this is why Pex itself defaults to
--style strict
locks.
Pants is getting in your way here and you probably don't actually mean to SBOM a lock file.
f
thanks for the heads up on that one. Can I call pex directly to generate a strict style lock? or is there a way I can deconstruct a pex?
e
So, this same whole -> SBOM could be a pex-tool added to PEXes, i.e.:
PEX_TOOLS=1 my.pex generate-sbom here.json
That would be easy to add to Pex. The actual wheels you really use are local at that point and re-hashing is cheap and eco-sensitive, etc.
Getting Pants to
--style strict
will be a bear I think.
f
That’s actually a great solution - and much more airtight when it comes to proving that the sbom is complete and accurate
e
Ok, great. That's much more sane.
Let me update the ticket notes with new code pointers for pex-tools.
f
thank you, I really appreciate it!
e
And, 1 more. Since a PEX is a single file - in some sense you can just SBOM that and be done. That does leave out the licenses and versions of all the included installed wheels though which you probably want.
For example, the Pants PEX (which few people use), has:
Copy code
$ zipinfo ~/Downloads/pants.2.16.0.dev3.pex | grep LICENSE
-rw-r--r--  2.0 unx     1078 b- defN 80-Jan-01 00:00 .bootstrap/pex/vendor/_vendored/setuptools/setuptools-44.0.0+3acb925dd708430aeaf197ea53ac8a752f7c1863.dist-info/LICENSE
-rw-r--r--  2.0 unx     1125 b- defN 80-Jan-01 00:00 .bootstrap/pex/vendor/_vendored/wheel/wheel-0.37.1.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     1101 b- defN 80-Jan-01 00:00 .deps/PyYAML-6.0-cp37-cp37m-macosx_10_9_x86_64.whl/PyYAML-6.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1101 b- defN 80-Jan-01 00:00 .deps/PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl/PyYAML-6.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1101 b- defN 80-Jan-01 00:00 .deps/PyYAML-6.0-cp38-cp38-macosx_10_9_x86_64.whl/PyYAML-6.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1101 b- defN 80-Jan-01 00:00 .deps/PyYAML-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl/PyYAML-6.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1101 b- defN 80-Jan-01 00:00 .deps/PyYAML-6.0-cp39-cp39-macosx_10_9_x86_64.whl/PyYAML-6.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1101 b- defN 80-Jan-01 00:00 .deps/PyYAML-6.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl/PyYAML-6.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1052 b- defN 80-Jan-01 00:00 .deps/certifi-2022.12.7-py3-none-any.whl/certifi-2022.12.7.dist-info/LICENSE
-rw-rw----  2.0 unx     1070 b- defN 80-Jan-01 00:00 .deps/charset_normalizer-2.1.1-py3-none-any.whl/charset_normalizer-2.1.1.dist-info/LICENSE
-rw-rw----  2.0 unx     1081 b- defN 80-Jan-01 00:00 .deps/chevron-0.14.0-py3-none-any.whl/chevron-0.14.0.dist-info/LICENSE
-rw-rw----  2.0 unx    10143 b- defN 80-Jan-01 00:00 .deps/fasteners-0.16.3-py2.py3-none-any.whl/fasteners-0.16.3.dist-info/LICENSE
-rw-rw----  2.0 unx     1523 b- defN 80-Jan-01 00:00 .deps/idna-3.4-py3-none-any.whl/idna-3.4.dist-info/LICENSE.md
-rw-rw----  2.0 unx     2265 b- defN 80-Jan-01 00:00 .deps/ijson-3.1.4-cp37-cp37m-macosx_10_9_x86_64.whl/ijson-3.1.4.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     2265 b- defN 80-Jan-01 00:00 .deps/ijson-3.1.4-cp37-cp37m-manylinux2010_x86_64.whl/ijson-3.1.4.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     2265 b- defN 80-Jan-01 00:00 .deps/ijson-3.1.4-cp38-cp38-macosx_10_9_x86_64.whl/ijson-3.1.4.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     2265 b- defN 80-Jan-01 00:00 .deps/ijson-3.1.4-cp38-cp38-manylinux2010_x86_64.whl/ijson-3.1.4.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     2265 b- defN 80-Jan-01 00:00 .deps/ijson-3.1.4-cp39-cp39-macosx_10_9_x86_64.whl/ijson-3.1.4.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     2265 b- defN 80-Jan-01 00:00 .deps/ijson-3.1.4-cp39-cp39-manylinux2010_x86_64.whl/ijson-3.1.4.dist-info/LICENSE.txt
-rw-rw----  2.0 unx      568 b- defN 80-Jan-01 00:00 .deps/importlib_resources-5.0.7-py3-none-any.whl/importlib_resources-5.0.7.dist-info/LICENSE
-rw-rw----  2.0 unx      197 b- defN 80-Jan-01 00:00 .deps/packaging-21.3-py3-none-any.whl/packaging-21.3.dist-info/LICENSE
-rw-rw----  2.0 unx    10174 b- defN 80-Jan-01 00:00 .deps/packaging-21.3-py3-none-any.whl/packaging-21.3.dist-info/LICENSE.APACHE
-rw-rw----  2.0 unx     1344 b- defN 80-Jan-01 00:00 .deps/packaging-21.3-py3-none-any.whl/packaging-21.3.dist-info/LICENSE.BSD
-rw-rw----  2.0 unx     1252 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/toml/toml-0.10.2.dist-info/LICENSE
-rw-rw----  2.0 unx     1082 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/attrs/attrs-21.5.0.dev0.dist-info/LICENSE
-rw-rw----  2.0 unx     1125 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/wheel/wheel-0.37.1.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     1090 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/pip/pip-20.3.4.dist-info/LICENSE.txt
-rw-rw----  2.0 unx      197 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/packaging_20_9/packaging-20.9.dist-info/LICENSE
-rw-rw----  2.0 unx    10174 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/packaging_20_9/packaging-20.9.dist-info/LICENSE.APACHE
-rw-rw----  2.0 unx     1344 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/packaging_20_9/packaging-20.9.dist-info/LICENSE.BSD
-rw-rw----  2.0 unx     1023 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/packaging_20_9/pyparsing-2.4.7.dist-info/LICENSE
-rw-rw----  2.0 unx     1078 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/setuptools/setuptools-44.0.0+3acb925dd708430aeaf197ea53ac8a752f7c1863.dist-info/LICENSE
-rw-rw----  2.0 unx     1023 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/packaging_21_3/pyparsing-2.4.7.dist-info/LICENSE
-rw-rw----  2.0 unx      197 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/packaging_21_3/packaging-21.3.dist-info/LICENSE
-rw-rw----  2.0 unx    10174 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/packaging_21_3/packaging-21.3.dist-info/LICENSE.APACHE
-rw-rw----  2.0 unx     1344 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex/vendor/_vendored/packaging_21_3/packaging-21.3.dist-info/LICENSE.BSD
-rw-rw----  2.0 unx    11323 b- defN 80-Jan-01 00:00 .deps/pex-2.1.116-py2.py3-none-any.whl/pex-2.1.116.dist-info/LICENSE
-rw-rw----  2.0 unx     1549 b- defN 80-Jan-01 00:00 .deps/psutil-5.9.0-cp37-cp37m-macosx_10_9_x86_64.whl/psutil-5.9.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1549 b- defN 80-Jan-01 00:00 .deps/psutil-5.9.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl/psutil-5.9.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1549 b- defN 80-Jan-01 00:00 .deps/psutil-5.9.0-cp38-cp38-macosx_10_9_x86_64.whl/psutil-5.9.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1549 b- defN 80-Jan-01 00:00 .deps/psutil-5.9.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl/psutil-5.9.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1549 b- defN 80-Jan-01 00:00 .deps/psutil-5.9.0-cp39-cp39-macosx_10_9_x86_64.whl/psutil-5.9.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1549 b- defN 80-Jan-01 00:00 .deps/psutil-5.9.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl/psutil-5.9.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1023 b- defN 80-Jan-01 00:00 .deps/pyparsing-3.0.9-py3-none-any.whl/pyparsing-3.0.9.dist-info/LICENSE
-rw-rw----  2.0 unx     1147 b- defN 80-Jan-01 00:00 .deps/python_lsp_jsonrpc-1.0.0-py3-none-any.whl/python_lsp_jsonrpc-1.0.0.dist-info/LICENSE
-rw-rw----  2.0 unx    10142 b- defN 80-Jan-01 00:00 .deps/requests-2.28.1-py3-none-any.whl/requests-2.28.1.dist-info/LICENSE
-rw-rw----  2.0 unx     1050 b- defN 80-Jan-01 00:00 .deps/setuptools-63.4.3-py3-none-any.whl/setuptools-63.4.3.dist-info/LICENSE
-rw-rw----  2.0 unx     1066 b- defN 80-Jan-01 00:00 .deps/six-1.16.0-py2.py3-none-any.whl/six-1.16.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1252 b- defN 80-Jan-01 00:00 .deps/toml-0.10.2-py2.py3-none-any.whl/toml-0.10.2.dist-info/LICENSE
-rw-rw----  2.0 unx    12755 b- defN 80-Jan-01 00:00 .deps/typing_extensions-4.3.0-py3-none-any.whl/typing_extensions-4.3.0.dist-info/LICENSE
-rw-rw----  2.0 unx     1959 b- defN 80-Jan-01 00:00 .deps/ujson-5.6.0-cp37-cp37m-macosx_10_9_x86_64.whl/ujson-5.6.0.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     1959 b- defN 80-Jan-01 00:00 .deps/ujson-5.6.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl/ujson-5.6.0.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     1959 b- defN 80-Jan-01 00:00 .deps/ujson-5.6.0-cp38-cp38-macosx_10_9_x86_64.whl/ujson-5.6.0.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     1959 b- defN 80-Jan-01 00:00 .deps/ujson-5.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl/ujson-5.6.0.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     1959 b- defN 80-Jan-01 00:00 .deps/ujson-5.6.0-cp39-cp39-macosx_10_9_x86_64.whl/ujson-5.6.0.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     1959 b- defN 80-Jan-01 00:00 .deps/ujson-5.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl/ujson-5.6.0.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     1115 b- defN 80-Jan-01 00:00 .deps/urllib3-1.26.13-py2.py3-none-any.whl/urllib3-1.26.13.dist-info/LICENSE.txt
-rw-rw----  2.0 unx     1050 b- defN 80-Jan-01 00:00 .deps/zipp-3.11.0-py3-none-any.whl/zipp-3.11.0.dist-info/LICENSE
That's a weird PEX though that contains 6 platforms worth of wheels, Python 3.{7,8,9} x Linux/Mac.
f
yes, the pex we generate is considered the “medical device” in our use case - the dependencies are what we need to report on. If I’m following your question correctly.
e
Right, gotcha. Makes sense.