What does pants cache in the remote cache for pyth...
# general
h
What does pants cache in the remote cache for python repos? Is it just input hash -> test results and any `pex_binary`s we might have created? Is it caching wheels built from requirements? Some kind of intermediate output?
e
Uniformly it remotely caches the results of all subprocesses it runs regardless of backend (Python, Java, Scala, Go, etc.). So for Python, for example, but not exhaustively: + test: These are run by default as 1 process per test file; so they are remote cached at that granularity + package: This operation only has the final PEX binary cached remotely and 0 intermediate results. Those (resolved wheels, etc) are only cached locally in Pants' Pex cache. When I say remotely caches - that more broadly means caches to disk. You should have a local lmdb cache entry as well. All the rest - intermediate rule results - are only cached in memory in pantsd.
h
What about a pants run call with a python program, any intermediates pulled from the remote cache?
Based on your description then the storage needs for the remote cache with just pants test are pretty small, right? Since it’s not caching the test files just the result and input metadata?
Or is it caching some built artifact of a test target
e
What about a pants run call with a python program, any intermediates pulled from the remote cache?
I'm not sure. There are several
run
goal semantics IIRC.
Based on your description then the storage needs for the remote cache with just pants test are pretty small, right? Since it’s not caching the test files just the result and input metadata?
I think our Python test process outputs include junit.xml by default, so those files IIRC - but still not too big. If you enable coverage, then those files too.
h
Okay thank you. Would be nice to know what caching run might use but not urgent
e
Yeah - there is no tracking issue for that that I'm aware of. You really need to have a test rig you can use to answer this for yourself for any given Pants version. Folks have been using a bazel remote cache docker image thing and presumably spinning that up on your machine and running a quick check is the way to go.
I have never used this, but you might find searching slack for these new keywords nets you answers / people in the know.
b
\q
h
Yeah I have some caches I can use to test, thanks
h
The
pants run
process isn't cached
(It uses InteractiveProcess instead of Process)
This is because you may expect side-effects or non-idempotent behavior when you
pants run
something
All the processes leading up to the actual run may be cached (dependency resolution, for example), but the actual binary you run is not
h
Would that dependency resolution step upload artifacts to the cache, like wheels?
h
Generally yes
Based on your description then the storage needs for the remote cache with just pants test are pretty small, right? Since it’s not caching the test files just the result and input metadata?
This is inaccurate, since many intermediate process inputs and outputs may be cached.