OK, got excited about trying remote caching (our `...
# general
e
OK, got excited about trying remote caching (our
./pants test ::
step takes ages on CircleCI creating all the wee mini-venvs), and tried a too-simple test: set up a
bazel-remote
instance on an EC2 instance and tried running multiple builds with
remote_cache_X
items set in pants.toml (or not, for the original clean local build). I then simulated a "clean build on a fresh CI node" by
rm -rf
ing
~/.cache/pants
and in the repository both
.pids
and
.pants.d
and doing
pants version
to at least populate the base tool before timing. ...and it seemed to make no difference; ie:
Copy code
Clean build local

________________________________________________________
Executed in  386.20 secs   fish           external
   usr time  461.72 millis  876.00 micros  460.84 millis
   sys time   65.67 millis  240.00 micros   65.43 millis

Clean build remote cache empty but on (for initial population)

________________________________________________________
Executed in  387.25 secs   fish           external
   usr time  450.78 millis  863.00 micros  449.92 millis
   sys time   58.88 millis  239.00 micros   58.64 millis

Clean build remote cache populated and on

________________________________________________________
Executed in  371.61 secs   fish           external
   usr time  449.75 millis    0.00 micros  449.75 millis
   sys time   67.19 millis  1258.00 micros   65.94 millis
... and when I was watching, the bulk of the time (up to 340 seconds) was spent
Resolving constraints.txt
. I'm using 2.10.0.dev0 with
Copy code
[python]
interpreter_constraints = ["CPython==3.9.*"]
requirement_constraints = "constraints.txt"
...the constraints.txt file generated by
pip freeze
and then hand-edited for a few things. Of course, successive runs of
pants test
are nearly instant, but when I can't count on a CI instance having any existing cache (and thus relying on the remote) it doesn't help to the degree I'd hoped. The --stats-log output is
Copy code
local_cache_read_errors: 0
  local_cache_requests: 65
  local_cache_requests_cached: 0
  local_cache_requests_uncached: 65
  local_cache_total_time_saved_ms: 0
  local_cache_write_errors: 0
  local_execution_requests: 61
  local_process_total_time_run_ms: 398394
  remote_cache_read_errors: 0
  remote_cache_requests: 29
  remote_cache_requests_cached: 4
  remote_cache_requests_uncached: 25
  remote_cache_speculation_local_completed_first: 36
  remote_cache_speculation_remote_completed_first: 4
  remote_cache_total_time_saved_ms: 5162
  remote_cache_write_attempts: 59
  remote_cache_write_errors: 0
  remote_cache_write_successes: 57
  remote_execution_errors: 0
  remote_execution_requests: 0
  remote_execution_rpc_errors: 0
  remote_execution_rpc_execute: 0
  remote_execution_rpc_retries: 0
  remote_execution_rpc_wait_execution: 0
  remote_execution_success: 0
  remote_execution_timeouts: 0
  remote_process_total_time_run_ms: 0
  remote_store_blob_bytes_downloaded: 1283423
  remote_store_blob_bytes_uploaded: 115816367
  remote_store_missing_digest: 0
...which implies a lot of misses. So I'm not sure what I'm missing myself (probably a lot; new territory!); why is "Resolving constraints.txt" taking so long, and how could I use remote caching to help, or have I just bolloxed up the setup entirely? (searching slack, it sounds like "Resolving constraints.txt" basically installs every package in constraints.txt; not sure how to speed that or if it's cacheable in any useful sense, so that could be most of the story right there)
e
Re resolve time: How does it compare to
python3.9 -mvenv test && test/bin/pip install -r constraints.txt -c constraints.txt
? We can't do much better than that since we indirectly use Pip; so if it's in the ballpark we can't do much except resolve less often 😕 If its way off though in Pip's favor, that should be actionable / indicate some sort of bug in Pex.
e
Almost certainly that's the case; I'll check (thanks). Hmm, though, perhaps the caching still could be useful and I need to test it with the full suite maybe on Circle; the base test suite takes about 25 minutes in CI and it would sure be nice to speed that up. I was a little dismayed that it looks like a fresh build using remote caching still actually runs all the tests--in other words it looks like caching the test results remotely doesn't help much for fresh CI nodes, and that is a bit of a surprise. Presumably because the resolution might have resulted in a different environment? But that's what a lockfile/constraints should have helped with, I thought. Would there be a way to cache the whole whopping "resolved constraints pex" through remote caching? I guess I'm mildly surprised that one of the rather large steps can't be sped up somehow by remote caching. Mind you, I'm well impressed with how well it works if I get to keep my same machine (but if I get to keep my same machine, local caching works pretty well...)
Of course, caching the whole thing would be a whopping cache miss for even a tiny missed lib, and you'd have to cache each package if you did it separately, so I get some of that. Hmm. Also hmm, the
python3.9 -mvenv...
command got me 119 seconds. But that was leveraging my existing pip cache. If I
rm -rf ~/.cache/pip
and try again, 184 seconds. So I'm confused now; not sure why it would be different (pip 21.3.1)
e
What's the equivalent Pants test? Was that the
rm -rf ~/.cache/pants && pkill pantsd && ./pants -V && time ./pants XYZ
?
e
Copy code
rm -rf ~/.cache/pants
rm -rf ./.pants.d
rm -rf ./.pids
./pants version (to repopulate pants)
time ./pants test (subdir)::
So as mentioned, may have done something strange there; wasn't sure how best to simulate a new machine. I started with the full suite (ie
time ./pants test ::
) but the iterations took too long so I started with a subset. Getting latish here (GMT) so I will try to pursue more Monday; thanks for your patience and help on this. Would really love to speed this up for us.