Hello I am wondering if there are ways to control the cache Pants #general

Hello! I am wondering if there are ways to control...

kind-traffic-20936

01/23/2025, 7:49 PM

Hello! I am wondering if there are ways to control the cache key when using a remote cache? For example in my case I want to test against two different databases, but when I configure that via an env variable, both end up sharing the same test cache. I'd like to be able to do something like

PANTS_REMOTE_CACHE_PREFIX="hello"

My other idea is to make everything depend on a file, and modify that file...

fast-nail-55400

01/23/2025, 7:54 PM

An idea: You could configure two different

remote_environment

targets each with some different platform property. Then you could try putting put the relevant test in both environments.

fast-nail-55400

01/23/2025, 7:54 PM

(Platform properties should be part of the cache key.)

kind-traffic-20936

01/23/2025, 7:55 PM

From reading the docs, I was under the impression

PANTS_REMOTE_INSTANCE_NAME

could do this, but its not clear.

kind-traffic-20936

01/23/2025, 7:55 PM

We don't use remote execution, does that exclude remove_environment?

fast-nail-55400

01/23/2025, 7:56 PM

From reading the docs, I was under the impression
PANTS_REMOTE_INSTANCE_NAME
could do this, but its not clear.

The issue with two different instance names is that any build actions which did not depend on a database would not be shared between the two different runs.

fast-nail-55400

01/23/2025, 7:57 PM

Ah so just remote cache. You could try two different

local_environment

targets then with some differentiating dummy environment variable.

fast-nail-55400

01/23/2025, 7:57 PM

The environment variables are part of the cache key.

kind-traffic-20936

01/23/2025, 7:57 PM

ok -- I'll read about local_environment, thanks for the pointer.

fast-nail-55400

01/23/2025, 7:58 PM

Or simpler: Could you just have a separate (minimal) test file for each test of a database type?

fast-nail-55400

01/23/2025, 7:58 PM

db_common_test.py

db1_test.py

db2_test.py

kind-traffic-20936

01/23/2025, 7:58 PM

So you are saying REMOTE_INSTANCE_NAME would mean we have two completely separate caches? I think it would be good enough for us...

fast-nail-55400

01/23/2025, 7:59 PM

Separate test files are cached distinctly.

fast-nail-55400

01/23/2025, 8:00 PM

So you are saying REMOTE_INSTANCE_NAME would mean we have two completely separate caches? I think it would be good enough for us...

Yes each instance name is a separate cache.

fast-nail-55400

01/23/2025, 8:01 PM

Disregard my

local_environment

idea. Separate test files with common helper is probably the better option. Then the type of the database basically becomes a parameter of the test.

broad-processor-92400

01/23/2025, 10:26 PM

Revisiting the initial question, I'm slightly surprised that configuring the DB via an env var isn't resulting in separate caches, because env vars are definitely used as part of the process cache key, i.e. different env vars => different process executions. If you want to dig into that, my first question is: how are the env vars being set and propagated?

elegant-florist-94385

01/24/2025, 1:00 AM

I do something similar where I have

Copy code

__defaults__ = {
    python_test: {
        extra_env_vars: parametrize(...)
    }
}

(probably butchered that syntax, but you get the idea) and it works to test with multiple different configurations of test code, and each parametrized test file gets its own cache entry, eg.

/path/to/test.py#paramset1, /path/to/test.py#paramset2,

elegant-florist-94385

01/24/2025, 1:01 AM

It even works to run all the variants of all the tests in one

pants test

invocation. The one (minor) drawback though, is that you can't import anything from the test files, else pants can't identify which of the variants of the file it should use for dependency inference. Its pretty minor though, just need to put all utility functions in a separate (non-test) file and call it a

python_source

instead of a

python_test

)

kind-traffic-20936

01/24/2025, 1:39 AM

Thanks for the responses everyone. Following up on the env var comment, I realize that probably the problem is that changing my database results in the same URL exposed via env var. I’ll look into it. If that’s the case, it would mean that adding CACHE_NAME to extra_env_vars and exposing CACHE_ENV=db1 would work. But that would be equivalent to just setting REMOTE_INSTANCE_NAME

kind-traffic-20936

01/24/2025, 6:32 PM

I confirm that PANTS_REMOTE_INSTANCE_NAME seems to have no effect for me. It probably affects only the gha cache?

broad-processor-92400

02/02/2025, 11:31 PM

Remote instance name is part of the Bazel REAPI protocol, and Pants (I believe) passes that through to the server in any caching requests... but it'll be up to the server to do something with that.

7 Views

Open in Slack

Previous Next