https://pantsbuild.org/ logo
h

happy-kitchen-89482

12/05/2021, 5:14 PM
I've noticed that running pex repeatedly on the same inputs, with a PEX_ROOT, still does a bunch of downloading, before then installing from cached installations.
E.g.,
Copy code
PEX_ROOT=${HOME}/.cache/pants/named_caches/pex_root time ${HOME}/.pyenv/versions/3.7.3/bin/python3.7 -m pex -vvvvvvvv --output-file repository.pex --resolver-version pip-2020-resolver --python ${HOME}/.pyenv/versions/3.7.3/bin/python3.7  --manylinux manylinux2014 -r requirements.txt --layout packed
Where requirements.txt is
Copy code
ansicolors==1.1.8
pip==21.3.1
setuptools==56.2.0
types-setuptools==57.4.2
azure-common==1.1.27
azure-core==1.20.1
azure-graphrbac==0.61.1
azure-identity==1.7.1
azure-keyvault-certificates==4.3.0
azure-keyvault-keys==4.4.0
azure-keyvault-secrets==4.3.0
azure-keyvault==4.1.0
azure-mgmt-apimanagement==2.1.0
azure-mgmt-authorization==1.0.0
azure-mgmt-automation==1.0.0
azure-mgmt-batch==16.0.0
azure-mgmt-compute==22.1.0
azure-mgmt-containerinstance==8.0.0
azure-mgmt-containerregistry==8.2.0
azure-mgmt-containerservice==16.4.0
azure-mgmt-core==1.3.0
azure-mgmt-cosmosdb==6.4.0
azure-mgmt-frontdoor==1.0.0
azure-mgmt-hybridkubernetes==1.1.0
azure-mgmt-keyvault==9.3.0
azure-mgmt-logic==9.0.0
azure-mgmt-managementgroups==1.0.0
azure-mgmt-monitor==2.0.0
azure-mgmt-msi==1.0.0
azure-mgmt-network==19.3.0
azure-mgmt-rdbms==8.1.0
azure-mgmt-resource==19.0.0
azure-mgmt-security==1.0.0
azure-mgmt-servicefabric==1.0.0
azure-mgmt-sql==3.0.1
azure-mgmt-storage==18.0.0
azure-mgmt-subscription==1.0.0
azure-mgmt-web==3.0.0
azure-storage-blob==12.8.1
azure-storage-queue==12.1.6
Takes ~20 seconds to run the first time, then about 17 seconds on every re-run, even though it eventually logs "Using cached installation of ..." for every dist
The logs pause for about 15 seconds with no indication of what's going on, but if I ctrl-c the stack trace shows me it's in the downloading code
@enough-analyst-54434 Is this expected/by design/necessary?
e

enough-analyst-54434

12/05/2021, 5:27 PM
All of the above since you've not said --intransitive which you probably mean. Even then Pex routes through Pip still. With --intransitive though Pex could add some optimizations to skip Pip. I believe that narrow case has an issue already, but I may be wrong.
I expect with -vvv or higher (triggers turning on Pip logging) you'll see plenty of resolving activity during the 15s.
h

happy-kitchen-89482

12/05/2021, 5:34 PM
I think this is all happening before Pip, in
pex/resolver.py
I have
-vvvvvv
and nothing
e

enough-analyst-54434

12/05/2021, 5:34 PM
It can't be.
Huh, very unexpected.
h

happy-kitchen-89482

12/05/2021, 5:34 PM
And unclear to me why there would be any downloading when everything has already been downloaded
e

enough-analyst-54434

12/05/2021, 5:34 PM
Pex does ~nothing prior to download.
h

happy-kitchen-89482

12/05/2021, 5:35 PM
Stack trace when I ctrl-c during the silent 15 seconds is:
Copy code
pex: Resolving interpreters: 0.3ms
pex: Building pex :: Resolving distributions (requirements.txt) :: Resolving requirements. :: Resolving for:
  DistributionTarget(interpreter=PythonInterpreter('/Users/benjyw/.pyenv/versions/3.7.3/bin/python3.7', PythonIdentity('/Users/benjyw/.pyenv/versions/3.7.3/bin/python3.7', 'cp37', 'cp37m', 'macosx_10_pex: Hashing pex                                                                                                                                                                                        pex: Hashing pex: 179.1ms                                                                                                            
pex: Isolating pex: 0.1ms
^CERROR: Operation cancelled by user
Traceback (most recent call last):
  File "/Users/benjyw/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/benjyw/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/benjyw/src/pex/pex/__main__.py", line 8, in <module>
    __name__ == "__main__" and pex.main()
  File "/Users/benjyw/src/pex/pex/bin/pex.py", line 672, in main
    env=env,
  File "/Users/benjyw/src/pex/pex/bin/pex.py", line 692, in do_main
    cache=ENV.PEX_ROOT,
  File "/Users/benjyw/src/pex/pex/bin/pex.py", line 596, in build_pex
    ignore_errors=options.ignore_errors,
  File "/Users/benjyw/src/pex/pex/resolver.py", line 920, in resolve
    lock_configuration=lock_configuration,
  File "/Users/benjyw/src/pex/pex/resolver.py", line 992, in _download_internal
    dest=dest, max_parallel_jobs=max_parallel_jobs
  File "/Users/benjyw/src/pex/pex/resolver.py", line 117, in download_distributions
    max_jobs=max_parallel_jobs,
  File "/Users/benjyw/src/pex/pex/jobs.py", line 510, in execute_parallel
    yield spawn_result.spawned_job.await_result()
  File "/Users/benjyw/src/pex/pex/jobs.py", line 214, in await_result
    job.wait()
  File "/Users/benjyw/src/pex/pex/jobs.py", line 72, in wait
    _, stderr = self._process.communicate()
  File "/Users/benjyw/.pyenv/versions/3.7.3/lib/python3.7/subprocess.py", line 931, in communicate
    self.wait()
  File "/Users/benjyw/.pyenv/versions/3.7.3/lib/python3.7/subprocess.py", line 990, in wait
    return self._wait(timeout=timeout)
  File "/Users/benjyw/.pyenv/versions/3.7.3/lib/python3.7/subprocess.py", line 1624, in _wait
    (pid, sts) = self._try_wait(0)
  File "/Users/benjyw/.pyenv/versions/3.7.3/lib/python3.7/subprocess.py", line 1582, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
e

enough-analyst-54434

12/05/2021, 5:35 PM
Download == resolve. You can't know everything has been downloaded.
h

happy-kitchen-89482

12/05/2021, 5:36 PM
I can't know everything has been downloaded, but I would have expected consulting a download cache before downloading a given thing
e

enough-analyst-54434

12/05/2021, 5:36 PM
Yeah, that appears to be during Pip download. I'm only confused why you're not seeing Pip logs.
h

happy-kitchen-89482

12/05/2021, 5:36 PM
So only download whatever has not previously been downloaded (in this case nothing)
Oh the jobs themselves are running pip download
e

enough-analyst-54434

12/05/2021, 5:37 PM
That should spew a bunch of logs.
h

happy-kitchen-89482

12/05/2021, 5:37 PM
Hmmm eventually I do see logs that may be pip
I'll post the entire log
e

enough-analyst-54434

12/05/2021, 5:41 PM
Bit the short of it here is this is just Pip behavior. Unless Pex wants to get back in the resolve business it can't optimize except for the --intransitive pinned case.
h

happy-kitchen-89482

12/05/2021, 5:43 PM
Pip doesn't have a download cache?
log.txt
e

enough-analyst-54434

12/05/2021, 5:44 PM
Pip does. Pex stores it in ~/.pex/pip IIRC.
But just run Pip 2x with this resolve and see for yourself.
Maybe you need 9 vs? You used 8 it looks like. My memory of Pip log triggering may be off.
But Pip and Pex aside, a second resolve for this only happens if the local LMDB does not exist. Is this a CI scenario you're debugging?
h

happy-kitchen-89482

12/05/2021, 6:40 PM
No, I'm looking more generally at why even a tiny change to constraints.txt triggers a very long re-resolve of the repository pex, when I would have expected us to get benefit from pex/pip's own caches. This is just the extreme endpoint of no changes at all...
Same thing with 9vs BTW, that doesn't change the logging behavior further
Here's the full 9 v's log
Adding
--intransitive
does speed things up, to about 12 seconds, but that is still slower than I would have expected. There is still about an 8 second logging pause right after
Copy code
pex: Resolving interpreters: 0.2ms
pex: Building pex :: Resolving distributions (requirements.txt) :: Resolving requirements. :: Resolving for:
  DistributionTarget(interpreter=PythonInterpreter('/Users/benjyw/.pyenv/versions/3.7.3/bin/python3.7', PythonIdentity('/Users/benjyw/.pyenv/versions/3.7.3/bin/python3.7', 'cp37', 'cp37m', 'macosx_10_pex: Spawning a maximum of 8 parallel jobs to process:
  DistributionTarget(interpreter=PythonInterpreter('/Users/benjyw/.pyenv/versions/3.7.3/bin/python3.7', PythonIdentity('/Users/benjyw/.pyenv/versions/3.7.3/bin/python3.7', 'cp37', 'cp37m', 'macosx_10_16_x86_64', (3, 7, 3))))                                                      
pex: Hashing pex: 135.8ms
pex: Isolating pex: 0.1ms
Maybe subsequent logs are buffered during that time or something
e

enough-analyst-54434

12/05/2021, 7:01 PM
If you want to file an issue with details I can look into this on the 8th. Although the intransitive pinned case can be optimized, I'm not sure what I can do otherwise to make things better, but I can at least get a perf regression test up that makes sure Pex is as fast as Pip for the relevant operation. That at least would avoid goose chases. This is a thing that has come up many times now.
h

happy-kitchen-89482

12/05/2021, 7:01 PM
To be clear, if I nuke all the caches then a run of this (without --intransitive) takes 60 seconds, then a re-run takes 20 seconds, so obviously caches are helping
I'm just surprised they're not helping more
Will dig further and file an issue
For extra context, my evidence that a download cache is not being used is that if I turn off wifi the re-run fails with network errors
Ugh, turns out this is an issue even if I eliminate pex and run pip directly
In a venv
pip download -r requirements.txt
takes ~25 seconds the first time, then ~12 seconds the second time, and it's definitely still sending network requests that second time, I guess either to check for new versions that match the requirement string (i.e., it doesn't optimize for
==
) or to check if a specific version has been yanked
w

witty-crayon-22786

12/06/2021, 3:27 AM
@enough-analyst-54434: How would
pex3 lock update
impact this? Would we spend less time in
pip
, such that its cache mattered less?
e

enough-analyst-54434

12/06/2021, 3:32 AM
We'll spend the same time. A lock is implemented as a download. A lock update as a download with constraints.
👍 1
h

happy-kitchen-89482

12/06/2021, 5:18 AM
I guess the big difference in terms of user experience is that if I call
pip install -r requirements,txt
in a venv and then modify
requirements.txt
and rerun
pip install -r requirements.txt
then it is incrementally additive - new versions will be installed or updated in the existing venv, but anything that hasn't changed doesn't require work (things that were removed from requirements.txt won't be uninstalled, but you can
pip uninstall
manually and it's similarly fast). In any case, this is significantly faster than rebuilding a venv from scratch, which is what we do every time. I wonder if there are optimizations to be had along these lines.
e

enough-analyst-54434

12/08/2021, 5:29 PM
Its hard to guess if mutating a venv would help. Since Pants currently cannot efficiently store a venv in LMDB, the process would need to look like this for the unhappy path: 1. Turn the packed layout repository PEX into a venv in the PEX_ROOT (named_caches/pex_root) 2. Mutate the venv with
pip install
3. Translate the mutated venv back into a packed layout repository PEX for storage in LMDB Alternatively, if Pants weren't to use Pex, it would need to do something similar to steps 1 and 3 on its own, coming up with some sort of cache-efficient packed layout that straddled being cache friendly and being sandbox creation latency friendly.
Actually, step 1 could not be done in named_caches since its a mutation and could be observed by concurrent processes. The venv creatiuon would need to occur in the sandbox where its isolated from all other processes. So, quite a bit worse. The unhappy path is the always-path IIUC.
w

witty-crayon-22786

12/08/2021, 5:35 PM
while the
named_caches/pex_root
is immutable, it’s only immutable after an entry is created… until then file locking is used, right?
so (not encouraging it, just a thought) it might be possible for PEX to clone/copy the venv into a new dir under a file lock, add things, then rename to the final computed name
but… yea. huge pain in the butt.
https://github.com/pantsbuild/pants/issues/12716 as it stands would help with immutable use of captured venvs, so could have a venv as an immutable input to a process execution, and then copy it, mutate it, recapture it. but still.
e

enough-analyst-54434

12/08/2021, 5:39 PM
I guess the point is these venvs are apparently huge (the ones spurring all this discussion), and not being able to re-use a venv and having to re-create it or copy it is apparently considered cost prohibitive? There were lots of 1st hand testimony that was narrowing in on perf issues, but, IIUC, little reproduceable data we could all poke at in the open.
w

witty-crayon-22786

12/08/2021, 5:45 PM
in this case, @happy-kitchen-89482 actually has an interesting repro repo he can share!
e

enough-analyst-54434

12/08/2021, 5:47 PM
@happy-kitchen-89482 that would be helpful. A repro repo + what actrions in the repo are deemed unacceptably slow (vs Pip). I'd love to stop using the caveat vs Pip, but tackling that would be a big investment. Basically writing our own resolver. It would be a fun problem of course, but big.
h

happy-kitchen-89482

12/08/2021, 10:37 PM
I don't know that we want to go down that path, the issue is more that we do a full re-resolve, instead of incrementally against a previous version. And there are good reasons for that (for one thing, which previous version would we use? and how would we know how to find one?) But from a user perspective, it can be a dramatic performance downgrade. I'll push a repo, so we can have a concrete thing to discuss.
So we are no worse than "pip reresolves the whole thing" but we are worse than "pip incrementally installs just one small change"
w

witty-crayon-22786

12/08/2021, 10:39 PM
for one thing, which previous version would we use? and how would we know how to find one?
the inputs to the
pex3 lock update
feature answer this question to a degree: the assumption is that you would feed PEX an old lockfile, and the added/removed requirements since that lockfile.
h

happy-kitchen-89482

12/08/2021, 10:48 PM
You need the venv for an old lockfile, not just the lockfile
And pex needs to support incremental venv building
w

witty-crayon-22786

12/08/2021, 10:49 PM
You need the venv for an old lockfile, not just the lockfile
there is a point at which we have that during
./pants update-lockfile
, because we have the old file, and new requirements that aren’t included yet.
but regardless, it will be a while before the dust settles
h

happy-kitchen-89482

12/08/2021, 10:54 PM
That is a good point, so there might be a way to get this to work with new-style lockfiles (and that's fine, we want to get rid of constraints files anyway)
See README.md for details
e

enough-analyst-54434

12/08/2021, 11:56 PM
Besides the vague notion that pip does an "incremental" resolve when in a venv, can you define what an incremental resolve is and how you do one? Afaict there is no such thing.
h

happy-kitchen-89482

12/08/2021, 11:56 PM
And note that this repo, at least on my macbook, also exposes issues with the dynamic UI
e

enough-analyst-54434

12/08/2021, 11:58 PM
IOW you have to do a full resolve, you may be able to skip downloads of some dists to peek at their metadata if they are either installed in the current venv or they are in the pip cache.
Bit whether they are installed in the current venv or in the pip download cache should not matter. Apparently that does matter though which either means Pip bug or Pip does resolves differently in the two cases - which sounds buggy.
h

happy-kitchen-89482

12/09/2021, 12:16 AM
Incremental resolve:
Copy code
$ python -m venv venv/ && source venv/bin/activate && pip install --upgrade pip setuptools wheel
...
$ time pip install -r pants-requirements.txt
<snip>
real	3m47.303s
user	2m20.837s
sys	0m33.380s
$ echo "# whitespace change" >> pants-requirements.txt 
$ time pip install -r pants-requirements.txt
Requirement already satisfied: adal==1.2.7 in ./venv/lib/python3.8/site-packages (from -r pants-requirements.txt (line 1)) (1.2.7)
Requirement already satisfied: aenum==2.2.6 in ./venv/lib/python3.8/site-packages (from -r pants-requirements.txt (line 2)) (2.2.6)
<snip>
Requirement already satisfied: zope.interface==5.4.0 in ./venv/lib/python3.8/site-packages (from -r pants-requirements.txt (line 412)) (5.4.0)
Requirement already satisfied: setuptools==58.4.0 in ./venv/lib/python3.8/site-packages (from -r pants-requirements.txt (line 413)) (58.4.0)
Requirement already satisfied: wheel>=0.29.0 in ./venv/lib/python3.8/site-packages (from scikit-build==0.12.0->-r pants-requirements.txt (line 346)) (0.37.0)

real	0m14.104s
user	0m12.643s
sys	0m0.546s
e

enough-analyst-54434

12/09/2021, 12:30 AM
Sorry - I understand what you observe - I was asking for thought behind what that means, how would you implement as a resolver writer, etc. I'll take it from here though with the repro repo. That was a very big missing piece. Thanks for getting that together.
FWIW I see incredibly strange behavior, A whitespace edit of pants-requirements.txt consistently outputs lots of: 162915.54 [INFO] Completed: Extracting 106 requirements to build requirements.pex And then a long running
Resolving pants-requirements.txt
that corresponds with a spike in network activity.
That is crazy-talk.
h

happy-kitchen-89482

12/09/2021, 12:33 AM
OK, try this for crazy - run that, then ctrl-C it, then run it again
I was seeing stuff like this and ctrl-C and re-run would then not try to redo all that work but just the work i would expect
e

enough-analyst-54434

12/09/2021, 12:34 AM
Agreed. ctrl-C then re-run after whitespace edit has network activity and resolve happen 1st out of the gate, which is what you expect.
So we have some broken in-mem caching somehow with pantsd here.
Ok. This will take a good bit of careful testing and notes so we don't thrash too badly. Lots of mixed together stuff going wrong here it seems.
Thanks again for this repo - its gold.
I don't know if it was intentional, but there was one issue with BUILDs causing the ctrl-C funkiness and slowing things down considerably: https://github.com/pantsbuild/requirements-perf/issues/1
h

happy-kitchen-89482

12/09/2021, 4:16 AM
Definitely a few mingled issues here: 1) The perf issues, 2) The dynamic UI issues, 3) Some weird caching issue
Re 3), I noticed the following, but haven't reproduced yet: I ran without
--python-run-against-entire-lockfile
and saw a bunch of "Extracting X requirements..." work, as you'd expect. Then ran with
--python-run-against-entire-lockfile
, and still saw those (which you wouldn't expect! - that flag means no subsetting should be necessary). Ctrl-C'd out of that in the middle of all that work, then re-ran with
--python-run-against-entire-lockfile
and this time those superfluous `Extracting X requirements..." work units didn't happen (as expected).
That was weird, I wonder if it has to do with speculation
cc @witty-crayon-22786 re my speculation about speculation
w

witty-crayon-22786

12/09/2021, 5:09 AM
potentially speculation, yea.
e

enough-analyst-54434

12/09/2021, 6:08 PM
@happy-kitchen-89482 the example repo uses the old psuedo-constraints lockfile hack instead of an actual lockfile. Is there a reason for that? I'm about to check the resolve time for that repo using a lockfile vs a constraints file, but assuming that's faster, the repro repo is a bit misleading using the old ~deprecated lock file vs the new actual ones.
h

happy-kitchen-89482

12/09/2021, 6:16 PM
That's what the users who provided that demo are using in their real repo, but there is no strong reason for that, and if we can improve things using a proper lockfile then we should, so feel free to update that repo.
And @witty-crayon-22786 any thoughts about the dynamic UI issues?
w

witty-crayon-22786

12/09/2021, 6:17 PM
which?
lots going on in this thread: if you can link to the comment describing it, that would help.
h

happy-kitchen-89482

12/09/2021, 6:22 PM
Including how to see the problem happening
w

witty-crayon-22786

12/09/2021, 6:23 PM
no thoughts currently: do you want me to switch from the JVM work to looking at this?
once https://github.com/pantsbuild/pants/pull/13848 is out, i can do some triage, but currently focused on the deadline next week
e

enough-analyst-54434

12/09/2021, 7:03 PM
@happy-kitchen-89482 I think final question on background here. Is it correct to say that the perf of a resolve (i.e. going down the "incremental" resolve rabbit hole) is a minor challenge in that requirements only change maybe one time a day for a really aggressive org; so these 4 minutes resolves are ~once a day things? Or is this a big deal because the 4-8 minutes are happening on every CI run? Or because the resolves are happening more often than ~once a day for local runs for some unknown reason? Even higher level - I'm trying to gauge the urgency of all this. Basically, are the long resolves an annoyance or a real problem blocking minute to minute working throughout the day?
w

witty-crayon-22786

12/09/2021, 7:09 PM
(and is “everyone needs to re-run the resolve” something that we should tackle via https://github.com/pantsbuild/pants/issues/12203?)
h

happy-kitchen-89482

12/09/2021, 8:02 PM
No, stay on JVM for now I think, but this is something to look at soon
Their concern on resolve performance is more about user laptop runs than CI runs, AFAIK, but I will try and pull them into this conversation to confirm. A big part of the problem does appear to be that resolves seem to be happening a lot more frequently than they should.
e

enough-analyst-54434

12/09/2021, 8:05 PM
Ok. That there is no mention of in the README. I'll try to repro that then and add repro steps to the README if I can find them.
h

happy-kitchen-89482

12/09/2021, 8:05 PM
I don't know how often they expect a legitimate re-resolve to happen.
I wonder if the hypothesized speculation issue is to blame. When pairing with them I have observed cases where there were no changes at all to the constraints file, yet a re-resolve happened.
e

enough-analyst-54434

12/09/2021, 8:06 PM
Yeah, I was being generous with 1x per day. We have an Asher and don't reach that.
So the unexpected re-resolve appears to be the #1 issue to solve.
h

happy-kitchen-89482

12/09/2021, 8:06 PM
In some of those cases, Ctrl-C in the middle of a seemingly spurious re-resolve, and then re-running, avoided the re-resolve!
So yes, I think you're right about that
e

enough-analyst-54434

12/09/2021, 8:07 PM
Ok, I'm probably going to peel off then for pex3 lock work. The incremental resolve stuff would be very cool, but its also hitting a smallish corner fwict.
w

witty-crayon-22786

12/09/2021, 8:08 PM
speculation only re-runs things that have run in the past… it should never make up work from nothing. so in particular, it will not cause this:
When pairing with them I have observed cases where there were no changes at all to the constraints file, yet a re-resolve happened.
when you observe this, you should look at the recorded traces to see what changed between runs 1, 2, and 3
e

enough-analyst-54434

12/09/2021, 8:09 PM
I think partly what Stu is saying, is he'd like a repro. I had one for slow resolves, we need one for unexpected ones.
w

witty-crayon-22786

12/09/2021, 8:17 PM
when you observe this, you should look at the recorded traces to see what changed between runs 1, 2, and 3
even just looking at the log output in
-ldebug
will tell you, because it will render the digest of the inputs to the process.
m

melodic-thailand-99227

12/10/2021, 7:19 AM
It’s really hard to reproduce those issues, they just keep coming and going on desktop users. And since each resolve is like 4-8 minutes (depending on the machine) it’s very frustrating. Most of the time if we restart the daemon it goes away and passing
-ldebug
causes the daemon to restart
h

happy-kitchen-89482

12/10/2021, 3:00 PM
So restarting the daemon will make the next command not rerun the resolve unnecessarily?
m

melodic-thailand-99227

12/11/2021, 7:00 AM
It still made it run, but only once
h

happy-kitchen-89482

12/11/2021, 4:53 PM
Was that run expected (because the requirements file changed) or unexpected?
How often do you expect a resolve to happen? That is, how many times per day/week does someone edit the constraints file?
h

hundreds-father-404

12/13/2021, 11:10 PM
@happy-kitchen-89482 this repo is private, I can't access it. Is that intentional?
h

happy-kitchen-89482

12/14/2021, 1:00 AM
Yes, I am waiting for the users who created that repo to allow it to be public
👍 1
This repo is public now
🙌 2