Hey folks. I have a monorepo with ~40 pex executab...
# general
f
Hey folks. I have a monorepo with ~40 pex executables being built by pants 2.10 on github actions in separate, concurrent workflows. We’ve been seeing intermittent but consistent “Could not find a version that satisfies the requirement” errors when PEX tries to locate and download python packages. These seem to occur at random, affecting a variety of common python packages. We’re using artifactory to mirror PyPi, and I’ve ensured that I’m locally caching the following paths. I’m also ensuring that the cache is populated with common libraries before before the 40 builds start running in parallel.
Copy code
~/.cache/pants/lmdb_store
~/.cache/pants/named_caches
~/.cache/pip
yet, I still see these errors.
Copy code
ProcessExecutionFailure: Process 'Building src.stages.fragment_counts/exe.pex with 9 requirements: boto3==1.21.45, confluent-kafka[avro]==1.8.2, cryptography==3.4.8, overrides==6.1.0, redis[hiredis]==4.3.4, toml==0.10.2, typeguard==2.13.3, types-redis==4.3.20, types-toml==0.10.1' failed with exit code 1.
stdout:

stderr:
ERROR: Could not find a version that satisfies the requirement boto3==1.21.45
ERROR: No matching distribution found for boto3==1.21.45
pid 1910 -> /home/runner/.cache/pants/named_caches/pex_root/venvs/97c49a45855c8337ed6c7437a8f9bce7e5f12e07/15ba0a31f835e32f40e40a07c519d52e41260cdc/pex --disable-pip-version-check --no-python-version-warning --exists-action a --isolated -q --cache-dir /home/runner/.cache/pants/named_caches/pex_root --log /tmp/process-execution74ifxC/.tmp/tmpa2vp83vp/pip.log download --dest /tmp/process-execution74ifxC/.tmp/tmpddxwdbv9/opt.hostedtoolcache.Python.3.9.15.x64.bin.python3.9 boto3==1.21.45 confluent-kafka[avro]==1.8.2 cryptography==3.4.8 overrides==6.1.0 redis[hiredis]==4.3.4 toml==0.10.2 typeguard==2.13.3 types-redis==4.3.20 types-toml==0.10.1 --index-url ***((our-artifactory-redacted))/artifactory/api/pypi/sw-pypi/simple --retries 5 --timeout 15 exited with 1 and STDERR:
None
Any advice on how I could go about fixing this?
I’ve already tried increasing pex verbosity, and running pants with “ldebug”, but I’m not seeing any new information beyond what is above.
h
Hmm, we used to see these in Pants’s own CI on occasion, IIRC
In “I’m also ensuring that the cache is populated with common libraries”, which cache are you referring to?
And what happens if you cut the concurrency down from 40 to 1, just as an experiment?
e
Yeah, and along those lines, what is new here? Is the 40 parallelism new? Is Artifactory new? Is the Pants or Pex version new?
f
Thanks for the great questions, all. • Cache — the github actions cache. Each of the ~40 jobs is an independent workflow. This means that each pants instance is running in its own docker image. • Populating the cache — I have an initial workflow that runs before the 40 others. It runs
./pants package
on a single executable target in our build, which forces pex/pip to download a bunch of commonly used wheels. After that step, github copies the contents of those three
.cache
directories (in my original message) into its cache. Later, inside the ~40 workflows that follow, those folders are copied from the github cache back into the filesystem before pants is called. • What’s new — this is where it gets interesting... We’ve been using this approach for a long time and seeing this error sporadically (once every 10 or 20 builds), but it was always rare enough that rerunning the failed part of the build was sufficient. Over the last months it’s gotten much worse, to the point where every build has 2 or 3 examples of this failure case, and sometimes will fail when re-running. • We’ve also seen some cases where individual users run into the same error while building a single pex on their laptops, but that’s still rare.
(also, we’ve been on this version of pants for the whole time)
e
So, you're saying nothing has changed? Except maybe cache size growth?
f
The only thing that has changed is the number of concurrent builds - that has increased with time. We used to have ~20 concurrent builds, and that has increased recently
e
Ok. Well one angle is Artifactory flakes under concurrent load. Has that been ruled out?
👆 1
That used to happen at Twitter for a private Artifactory maven repo for example.
It would not 404 or 5xx, it would return no pom silently or a pom but no jar, can't remember which.
f
That’s pretty much exactly what we’ve been seeing, and I feel like I’ve been going crazy trying to debug it! I saw a few reports of something like this online, but they were never well resolved.
Do you know what the solution was at Twitter?
e
I don't recall if there was a solution even.
f
That’s too bad. Regardless of artifactory’s flakiness, the main point of my question is, I’m reasonably sure that the
.cache
folders already have the wheels for these packages that keep failing, and I’m trying to learn why pex would keep going out to artifactory despite the cached wheel.
So far, all of the python packages that I’ve seen this happen with are packages that are being installed in my initial step to populate the cache.
e
If you have any ranged requirements and aren't using lock files, Pex / Pip sort of have to go look for new things showing up in the range.
Newer Pants has solid lock file support. Maybe a good reason to upgrade.
f
ah, thanks. Well, time for me to move that ticket out of our backlog 🙂.
Appreciate it.
e
You're welcome. So that's the case then? Not fully pinned down?
f
You know what, these are all == requirements - is that what you mean by pinned down?
for example,
boto3==1.21.45
e
That's what I mean. But for zero searching they also need to be wheels. Any sdist builds can again reach out for build deps.
Even w/o sdists, IIRC, if a pinned requirement itself has dependency metadata that is ranged, Pip searches.
f
I can actually guarantee that these are all wheels, because we have a platform specified in our pex_binary target, so we were forced to compile our own in some cases. I could understand transitive dependencies still not being fully pinned, since that’s up to the author of those packages, but wouldn’t we see the issue on those transitive deps instead of, for example, boto3 (which is our most common failure mode)?
e
You need lock files.
Well, boto3 is a transitive dep too I assume.
f
boto3 is actually a direct, pinned dependency
e
With lockfiles we just download wheels directly, no searches, and the downloads all short circuit in the Pex cache.
I'm saying you undoubtedly have other deps that in turn dep on boto3. Is that true?
f
oh, fair enough - I see.
Thanks for explaining, that makes a lot of sense. I’ll bring it back to my team. I appreciate your time and advice!
e
Ok, great. You're welcome. The confusing thing for many about Pants / Pex is that they are hermetic / immutable by nature. Any resolve starts from a clean slate with the reqs you give it. The resolves do not mutate an existing venv. That mutation of an existing venv mode that People usually use Pip in does short circuit effectively based on venv contents.
With lockfiles Pants / Pex gain back the hard short circuit.