Hello! I am wondering if there are ways to control...
# general
k
Hello! I am wondering if there are ways to control the cache key when using a remote cache? For example in my case I want to test against two different databases, but when I configure that via an env variable, both end up sharing the same test cache. I'd like to be able to do something like
PANTS_REMOTE_CACHE_PREFIX="hello"
My other idea is to make everything depend on a file, and modify that file...
f
An idea: You could configure two different
remote_environment
targets each with some different platform property. Then you could try putting put the relevant test in both environments.
(Platform properties should be part of the cache key.)
k
From reading the docs, I was under the impression
PANTS_REMOTE_INSTANCE_NAME
could do this, but its not clear.
We don't use remote execution, does that exclude remove_environment?
f
From reading the docs, I was under the impression
PANTS_REMOTE_INSTANCE_NAME
could do this, but its not clear.
The issue with two different instance names is that any build actions which did not depend on a database would not be shared between the two different runs.
Ah so just remote cache. You could try two different
local_environment
targets then with some differentiating dummy environment variable.
The environment variables are part of the cache key.
k
ok -- I'll read about local_environment, thanks for the pointer.
f
Or simpler: Could you just have a separate (minimal) test file for each test of a database type?
db_common_test.py
,
db1_test.py
,
db2_test.py
k
So you are saying REMOTE_INSTANCE_NAME would mean we have two completely separate caches? I think it would be good enough for us...
f
Separate test files are cached distinctly.
So you are saying REMOTE_INSTANCE_NAME would mean we have two completely separate caches? I think it would be good enough for us...
Yes each instance name is a separate cache.
Disregard my
local_environment
idea. Separate test files with common helper is probably the better option. Then the type of the database basically becomes a parameter of the test.
b
Revisiting the initial question, I'm slightly surprised that configuring the DB via an env var isn't resulting in separate caches, because env vars are definitely used as part of the process cache key, i.e. different env vars => different process executions. If you want to dig into that, my first question is: how are the env vars being set and propagated?
e
I do something similar where I have
Copy code
__defaults__ = {
    python_test: {
        extra_env_vars: parametrize(...)
    }
}
(probably butchered that syntax, but you get the idea) and it works to test with multiple different configurations of test code, and each parametrized test file gets its own cache entry, eg.
/path/to/test.py#paramset1, /path/to/test.py#paramset2,
It even works to run all the variants of all the tests in one
pants test
invocation. The one (minor) drawback though, is that you can't import anything from the test files, else pants can't identify which of the variants of the file it should use for dependency inference. Its pretty minor though, just need to put all utility functions in a separate (non-test) file and call it a
python_source
instead of a
python_test
)
k
Thanks for the responses everyone. Following up on the env var comment, I realize that probably the problem is that changing my database results in the same URL exposed via env var. I’ll look into it. If that’s the case, it would mean that adding CACHE_NAME to extra_env_vars and exposing CACHE_ENV=db1 would work. But that would be equivalent to just setting REMOTE_INSTANCE_NAME
I confirm that PANTS_REMOTE_INSTANCE_NAME seems to have no effect for me. It probably affects only the gha cache?
b
Remote instance name is part of the Bazel REAPI protocol, and Pants (I believe) passes that through to the server in any caching requests... but it'll be up to the server to do something with that.