Hi! Sorry for the general / amorphous question. I'...
# general
a
Hi! Sorry for the general / amorphous question. I'm looking to improve performance for testing in Pants with a cold cache, I have some ideas but I was wondering if folks here have more tips. I'm going to write some stuff in the ๐Ÿงต as to not muddy the main thread.
๐Ÿ‘€ 1
Couple of things to build context: โ€ข We're using Poetry and
poetry export
to make a lockfile, which means we're doing
run_against_entire_lockfile
so I know we're getting bad cache performance on requirements changes. โ€ข We are a medium/large sized Python mono repo. โ€ข We're running
2.17
until
2.20
comes out because of an issue with
--changed-since
, but we're using the Rust Python parser. Stats from a sampled job:
Copy code
- 13:42:21 -- started scheduler / inferring deps
- 13:43:52 -- started installing dependencies
- 13:47:24 -- finished installing dependencies
- 13:48:39 -- finished building requirements pex
- 13:50:58 -- finished running pytest

dep inference   = 1:31min
installing deps = 3:36min
building pex    = 1:15min
running tests   = 1:19min
Output from `scc`:
Copy code
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Language                 Files     Lines   Blanks  Comments     Code Complexity
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Python                   34722   7680515   496115    455043  6729357     240871
JSON                      2182   2538838      262         0  2538576          0
Plain Text                 589     53129     2158         0    50971          0
YAML                       377     13368      419       384    12565          0
License                    278     20437     3578         0    16859          0
CSV                        268     50417       12         0    50405          0
XML                        204     14629      619        62    13948          0
Cython                     108     35698     4163      1510    30025       4007
Protocol Buffers           108     16731     1756      9885     5090          0
C Header                    65     28529     2928      6770    18831       1170
Markdown                    64      3407      876         0     2531          0
Shell                       62      3040      486       242     2312        242
SQL                         56      3803      317       316     3170         25
ReStructuredText            55      3317      825         0     2492          0
Rust                        47      7760      516       324     6920        139
TOML                        39       986       81        83      822          6
Jinja                       32       691       14         5      672        220
C                           25    131528     5748     19207   106573      23789
HTML                        23      5186       80      2769     2337          0
XML Schema                  22      4892      310         0     4582          0
JavaScript                  18      1789      219       271     1299        205
CSS                         17      3546      604        92     2850          0
CMake                       16      2203      297       391     1515        203
Mako                        13       717      173         4      540          5
C++ Header                  11      3846      487       854     2505        166
Dockerfile                  10       299       64        30      205         40
... truncated things which do not have 10 or more files
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Total                    39472  10661743   525317    499569  9636857     271523
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
A couple of questions: โ€ข Is there anything I can do to make Python dependency inference faster on a cold boot? Or is there any work coming down the pipe to improve performance? I saw some other threads talking about this where someone was talking about pulling some optimizations from the Python parser into Rust. โ€ข A lot of that dependency installation time comes building and installing wheels which do not have wheels available on PyPI. These also aren't being cached on our remote caching provider. Do we have the option to remotely cache wheels that are built as part of a build? โ€ข Does anyone know why it takes so long to make the
.pex
file for our requirements set? I would expect it to take less time to essentially copy files around and pack a
venv
into a file.
I have a couple of things planned to try to improve speed: โ€ข Move to a pex lockfile, so we can turn off
run_against_entire_lockfile
. โ€ข Boot our CI workers with a recently-up-to-date pip cache on disk for when requirements change. But I was hoping folks would also have more ideas on how to help!
Sorry for the huge info dump, and thank you to anyone who feels compelled to help out ๐Ÿ˜„. And please feel free to tell me "you'll need to implement X"--I'm happy to learn about Pants internals and contribute if needed ๐Ÿ™‚
h
Have you considered remote caching? That might help a lot.
Of course that depends on how much of the problem is cold cache vs too-frequent invalidation
sounds like you might have both problems
a
Oh! Yes sorry I forgot to mention that, we are using remote caching. In this case it was missed by our remote cache, so it also does sound like too-frequent invalidation:
Copy code
13:50:59.28 [INFO] Counters:
  backtrack_attempts: 0
  docker_execution_errors: 0
  docker_execution_requests: 0
  docker_execution_successes: 0
  local_cache_read_errors: 0
  local_cache_requests: 11
  local_cache_requests_cached: 0
  local_cache_requests_uncached: 11
  local_cache_total_time_saved_ms: 0
  local_cache_write_errors: 0
  local_execution_requests: 12
  local_process_total_time_run_ms: 427923
  remote_cache_read_errors: 0
  remote_cache_request_timeouts: 0
  remote_cache_requests: 6
  remote_cache_requests_cached: 2
  remote_cache_requests_uncached: 4
  remote_cache_speculation_local_completed_first: 0
  remote_cache_speculation_remote_completed_first: 2
  remote_cache_total_time_saved_ms: 6487
  remote_cache_write_attempts: 4
  remote_cache_write_errors: 0
  remote_cache_write_successes: 4
  remote_execution_errors: 0
  remote_execution_requests: 0
  remote_execution_rpc_errors: 0
  remote_execution_rpc_execute: 0
  remote_execution_rpc_retries: 0
  remote_execution_rpc_wait_execution: 0
  remote_execution_success: 0
  remote_execution_timeouts: 0
  remote_process_total_time_run_ms: 0
  remote_store_missing_digest: 0
  remote_store_request_timeouts: 0
But it sounds like we should focus on preventing cache invalidation, so we can use our remote cache more effectively?
Obviously +/- data capture and investigation, but as a vibe check response does that sound reasonable?
h
Are you changing requirements frequently in your repo?
I guess the first thing I would check is, are you not getting cache hits even when you would expect to, such as running the same CI twice with no requirements changes (or even no changes at all)
a
Super delayed response, lots of meetings today ๐Ÿ˜…, sorry about that. I think I found a couple of issues w/ environment variables changing run-to-run that poison test caching. That will help once we split apart targets I think. But that shouldn't affect anything before test time, right?
c
Is there anything I can do to make Python dependency inference faster on a cold boot?
The cold performance is a CI specific concern, correct? Or are you seeing the local
pantsd
cache not help on your workstation.
A lot of that dependency installation time comes building and installing wheels which do not have wheels available on PyPI. These also aren't being cached on our remote caching provider. Do we have the option to remotely cache wheels that are built as part of a build?
To solve unrelated problem of people trying to develop on an OS (mac!) that doesn't match the deployment target (Linux), we pre-build wheels for all 3rdparty dependencies. This incidentally sidesteps the problem you described.
Does anyone know why it takes so long to make the .pex file for our requirements set? I would expect it to take less time to essentially copy files around and pack a venv into a file.
I'm in ML-land with lots of crazy huge wheels so my intuition is for that operation to be slow!
h
Env vars can perturb many things, depending on which ones
a
I mean env vars that we're using inside of tests, not env vars to configure Pants. I'm not sure if that changes anything ๐Ÿ˜„
The cold performance is a CI specific concern, correct?
Yep, just cold performance. We keep pantsd alive between runs and it is typically pretty quick afterwards. Maybe we could warm a machine before marking it as "ready," but then we just make people wait for available compute instead of waiting for tests.
we pre-build wheels for all 3rdparty dependencies
Yeah I've been thinking about this already. We don't have a particularly mature package management ecosystem, so it would take a non-zero amount of work to set this up. Still something we should do eventually.
Also thank you again y'all for pointers, I'm going to keep looking at making us hit our cache more often, and then also try to get to a place where we can split out test targets to do more partial remote caching.
Oh one last question: would having large cycles in a codebase force pantsd to re-process many files during dependency inference? I noticed that even when warm many times pants takes ~1:30min to infer dependencies.
h
I don't think cycles, specifically, should affect dep inference performance
You are using the rust dependency parser?
Ah yes, you mentioned that you are
How many source files are we talking about?
a
34722
python files, at least according to
scc
I was also thinking about this thread, which is similar: https://pantsbuild.slack.com/archives/C046T6T9U/p1710878960649239