Hi, I have been using Pants at work for a number o...
# welcome
b
Hi, I have been using Pants at work for a number of years on a smallish repo and was recently evaluating it for a multi million line code base with a lot of messy dependencies. One major pain point was even when I didn’t make any modifications rerunning tests would take a while due to the dependency inference being rerun every time
👋 2
f
What version(s) of Pants were you using?
On Pants v2, dependency inference is cached, and on 2.17 or the forthcoming 2.18 (I forget which) Python dependence inference has been re-implemented in Rust for speed (and still cached of course)
h
Another large codebase's clean dep inference time went from ~45 minutes to I think like 3 minutes (?) with the Rust reimplementation (@rapid-bird-79300 can weigh in)
But dep inference rerunning every time sounds like a bug, as it should be cached. Unless you mean this across separate CI runs, on CI machines with no preserved state between runs?
That is where remote caching is a huge win
b
2.16.0
I can test it out on 2.17
(just realized I don’t know how to use select the beta version)
(this testing was all done on a single machine back to back)
r
Can confirm dependency inference is no longer a bottleneck for large scale monorepo. The rust parser makes it lightning fast for large monorepos. Highly recommend trying that out (
2.17.0rc1
). To note, we do experience performance issues in CI with recomputing of `Find targets from input specs`and
Map all targets to their dependents
it's extremely slow without proper pantsd setup. This is also true in our local environments. If you run
./pants dependents
without making any changes, you will still wait 4-5mins for results. I think it is related to this issue https://github.com/pantsbuild/pants/issues/18911 A way to improve this is to increase pantsd max memory and get more caching benefits. I recommend you try this as well if you have the resources. In CI, we haven't figured out how to get these benefits, I'd imagine remote caching server would help here but never tried it. This is how we use cache in CI today: • We have a dedicated agent queue for CI pants runs (buildkite) • Every commit on main generates a cache artifact • A PR CI pulls the nearest cache from base commit sha and runs pants In this setup, dependency inference is extremely fast although we do face those other bottlenecks mentioned above. Another idea is just to have a long running agent queue in CI for pants executions and just nuke the cache if it gets too big (as mentioned in docs).
b
is there any pants caching on disk (I have seen cases where pantsd was killed due to memory limits)
on pants 2.16 pants test SOME_FILE just ran successfully and now rerunning it took 2 minutes with at least
⠁ 74.78s Resolve transitive targets
r
yeah that is very similar to what we experience. Just mapping dependents takes such a long time.
b
rerunning with 2.17.0rc1
h
Pants caches in memory and on disk
More specifically: processes are cached on disk, and rules are cached in RAM
so a restart of pantsd causes all rules to reexecute, but typically rules are thin wrappers around processes.
E.g., a rule might set up a pytest run, and execute it in a process
parsing deps from a source file is run in a process, so you should be getting its results from disk.
Your observation that "pants test SOME_FILE just ran successfully and now rerunning it took 2 minutes" sounds like something has gone seriously awry, unless you made substantial code/config changes between the two runs
That said, "Resolve transitive targets" is unrelated to dep inference. It is, however, one of the more CPU-intensive rules whose main logic does not run in a process, so the pantsd restart is hurting you there.
b
Scheduling: Determine Python dependencies for
is what most the of the time is ging to
I can see sometimes if I rerun the command (
pants test SOME_FILE
) is super quick (7 seconds) but other times it’s busy Determine Python dependencies
thanks @happy-kitchen-89482 (and everyone else), what you just described matches what I am seeing exactly (it’s slow when I see the message waiting for pantsd to start)
unfortunately this is a key issue that has meant we can’t use pants in this case (yet)
h
Sounds like the first order of business is to figure out why pantsd restarts, and stop that from happening so frequently
b
it seems like it’s memory related? not sure what to check