Hi Pants experts, I have seen that invoking pants ...
# general
p
Hi Pants experts, I have seen that invoking pants commands via Python subprocess is very slow for the current version (2.17). Previously we did this with Pants 2.14 and it is super fast. Could you shed lights on what could be the problem? I have a script to repro this https://gist.github.com/mingshi-wang/58c68d99298496b1ad0eeae3654fd996. Thanks!
e
Are you saying the delta between running
./pants roots
by hand in each version does not track the delta when run via that script?
p
Correct.
e
@powerful-florist-1807 I do not repro. Pants, of course, gets slower, but the delta is the ~same for each mode of runnng it. Here cmd.py is exactly your script and cmd.venv is a Python 3.10 venv with a `pip install click`: Straight Pants 16+/-1% slower:
Copy code
$ hyperfine -w2 'cd /tmp/mingshi/2.14.0 && ./pants roots' 'cd /tmp/mingshi/2.17.0 && ./pants roots'
Benchmark 1: cd /tmp/mingshi/2.14.0 && ./pants roots
  Time (mean ± σ):     499.4 ms ±   2.4 ms    [User: 435.0 ms, System: 33.3 ms]
  Range (min … max):   495.0 ms … 503.3 ms    10 runs

Benchmark 2: cd /tmp/mingshi/2.17.0 && ./pants roots
  Time (mean ± σ):     580.8 ms ±   5.5 ms    [User: 511.6 ms, System: 26.1 ms]
  Range (min … max):   573.8 ms … 592.3 ms    10 runs

Summary
  'cd /tmp/mingshi/2.14.0 && ./pants roots' ran
    1.16 ± 0.01 times faster than 'cd /tmp/mingshi/2.17.0 && ./pants roots'
Scripted Pants 14+/-2% slower:
Copy code
$ hyperfine -w2 'cd /tmp/mingshi/2.14.0 && /tmp/mingshi/cmd.venv/bin/python /tmp/mingshi/cmd.py -- run-pants-cmd' 'cd /tmp/mingshi/2.17.0 && /tmp/mingshi/cmd.venv/bin/python /tmp/mingshi/cmd.py -- run-pants-cmd'
Benchmark 1: cd /tmp/mingshi/2.14.0 && /tmp/mingshi/cmd.venv/bin/python /tmp/mingshi/cmd.py -- run-pants-cmd
  Time (mean ± σ):     535.9 ms ±   7.7 ms    [User: 467.6 ms, System: 36.2 ms]
  Range (min … max):   527.1 ms … 549.2 ms    10 runs

Benchmark 2: cd /tmp/mingshi/2.17.0 && /tmp/mingshi/cmd.venv/bin/python /tmp/mingshi/cmd.py -- run-pants-cmd
  Time (mean ± σ):     611.7 ms ±   5.3 ms    [User: 540.5 ms, System: 28.0 ms]
  Range (min … max):   606.4 ms … 624.0 ms    10 runs

Summary
  'cd /tmp/mingshi/2.14.0 && /tmp/mingshi/cmd.venv/bin/python /tmp/mingshi/cmd.py -- run-pants-cmd' ran
    1.14 ± 0.02 times faster than 'cd /tmp/mingshi/2.17.0 && /tmp/mingshi/cmd.venv/bin/python /tmp/mingshi/cmd.py -- run-pants-cmd'
My two test repos are empty though - no roots. Maybe
./pants roots
got slower and you have many roots? But even then, my experiment seems to indicate that using Pants directly and using from a script has little effect on speed in either version.
@powerful-florist-1807 perhaps if you can present equivalently detailed analysis that shows a marked difference in your setup, we can make progress.
p
Thanks John for helping this! I will see if I can get details instrumentations on this.
e
The thing to look out for if it is in fact slower is pantsd restarts. Pants is abysmally slow in all versions and getting worse steadily when run without the Pants daemon (or when the daemon needs to restart). The daemon only needs to restart when you change certain files or args (not done here), or it runs out of memory. Perhaps 2.17.0 triggers the latter for you and pantsd restarts ~constantly?
For example (the outlier is in the fast case, so not relevant to the main point here):
Copy code
$ hyperfine -w2 'cd /tmp/mingshi/2.14.0 && ./pants roots' 'cd /tmp/mingshi/2.17.0 && ./pants roots' 'cd /tmp/mingshi/2.14.0 && ./pants --no-pantsd roots' 'cd /tmp/mingshi/2.17.0 && ./pants --no-pantsd roots'
Benchmark 1: cd /tmp/mingshi/2.14.0 && ./pants roots
  Time (mean ± σ):     522.0 ms ±  62.0 ms    [User: 435.9 ms, System: 34.4 ms]
  Range (min … max):   490.3 ms … 697.8 ms    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: cd /tmp/mingshi/2.17.0 && ./pants roots
  Time (mean ± σ):     577.4 ms ±   6.4 ms    [User: 506.0 ms, System: 28.5 ms]
  Range (min … max):   567.4 ms … 588.2 ms    10 runs

Benchmark 3: cd /tmp/mingshi/2.14.0 && ./pants --no-pantsd roots
  Time (mean ± σ):      1.037 s ±  0.017 s    [User: 0.952 s, System: 0.038 s]
  Range (min … max):    1.016 s …  1.067 s    10 runs

Benchmark 4: cd /tmp/mingshi/2.17.0 && ./pants --no-pantsd roots
  Time (mean ± σ):      4.606 s ±  0.064 s    [User: 4.479 s, System: 0.095 s]
  Range (min … max):    4.551 s …  4.758 s    10 runs

Summary
  'cd /tmp/mingshi/2.14.0 && ./pants roots' ran
    1.11 ± 0.13 times faster than 'cd /tmp/mingshi/2.17.0 && ./pants roots'
    1.99 ± 0.24 times faster than 'cd /tmp/mingshi/2.14.0 && ./pants --no-pantsd roots'
    8.82 ± 1.06 times faster than 'cd /tmp/mingshi/2.17.0 && ./pants --no-pantsd roots'
As you can see, Pants 2.14 perf without the daemon is bad (2x slower than with), but Pants 2.17 perf without the daemon is horrible (8x slower than with).
I haven't been hacking on Pants in that time period so I'm not sure exactly why its gotten so much worse. Someone else may be able to speak to that.
p
Thanks John for the insights on pants daemon! I suspect that when invoking pants cmd on a separate process spawned by the python subprocess, the deamon is disabled by default.
I compared the speed of the following commands: 1. time ./pants root 2. time python cmt.py -- run-pants-cmd 3. time PANTS_CONCURRENT=True pants run cmd.py -- run-pants-cmd The results are: 1. ./pants roots 0.94s user 0.13s system 54% cpu 1.951 total 2. python ci/src/python/pynest_ci/cmd.py -- run-pants-cmd 1.00s user 0.16s system 69% cpu 1.682 total 3. PANTS_CONCURRENT=True pants run ci/src/python/pynest_ci/cmd.py -- 50.54s user 9.23s system 114% cpu 52.087 total Apparently, 1 & 2 are comparable, whereas 3 is significantly slower. In my company, we have a python script ci_runner.py that invokes Pants cmd to calculate the test targets to execute. We invoke ci_runner.py via
pants run ci_runner.py ...
which is why it makes CI very slow.
I will try running ci_runner directly w/o "pants run" to see if there are improvements.
e
So
PANTS_CONCURRENT=True
disables pantsd by design. Maybe that wasn't true in 2.14.0? I think it was though.
Whenever you run Pants in a way that disables pantsd use, you'll be in a dire situation. Pants is for all practical purposes totally unusable without pantsd. 8 seconds to do a ~noop thing is clearly insane. That's ~24 billion clock cycles on my machine!!! The universe is ~13 billion years old. Pants w/o pantsd executes on ~geologic timescales.
I think this tracks allowing
PANTS_CONCURRENT=True
to still use pantsd: https://github.com/pantsbuild/pants/issues/7654
👍 1
p
That explains!
e
@powerful-florist-1807 but it does not explain a 2.14 -> 2.17 difference from my experiments above. That transition only accounts for a factor of 4 slowdown. Of course my test was in a noop empty repo. Perhaps the scaling is non-linear?
Is a factor of 4 what you see between the script under Pants 2.14 vs the script under Pants 2.17?