https://pantsbuild.org/ logo
#general
Title
# general
c

cold-branch-54016

04/06/2023, 11:20 AM
Our integration tests require a Kubernetes cluster. To ensure we can run multiple CI pipelines in parallel (e.g. for different PRs) we have a pool of clusters available. When a CI pipeline starts, it selects one cluster from the pool and executes the test on it. Currently we do this by setting an env var that is picked up by the setup code for the tests. When using pants to run our integration tests we can pipe this env var through without problems. But since the value of the env var can change between different runs we can not fully benefit from the test caching of pants. Is there a better way to provide the cluster information to the test suite without invalidating the cache?
r

refined-addition-53644

04/06/2023, 11:33 AM
I wonder if you can use batching of tests to achieve this. Maybe map the cluster to certain
batch_compatibility_tag
? These batched tests would then be cached based on these tags. I am not sure if you can pass these tags dynamically though. Worth try using
env()
to read them i.e.
Copy code
batch_compatibility_tag=env('CLUSTER_ID', "default_val")
My assumption is here that
CLUSTER_ID
takes only limited fixed number of values. You can read more about batching etc, if you haven’t used them yet https://www.pantsbuild.org/docs/python-test-goal#batching-and-parallelism
c

cold-branch-54016

04/06/2023, 12:30 PM
I don’t quite understand how that might help in this case. Lets assume I have a PR and the first CI run uses
cluster_id=1
for the integration tests. I now push some changes that only affect a subset of the test suite. For this second CI run
cluster_id=2
gets selected. Since the value of the env var we currently use (or in your case the
batch_compatibility_tag
changes) the cache gets invalidated for the whole test suite and all tests are run again, instead of the subset of tests that matter based on my changes. Did I miss something?
r

refined-addition-53644

04/06/2023, 12:47 PM
You didn’t miss anything. I was thinking more on the line that after certain number of times you have already ran the tests on all possible clusters it might help but yeah I am not entirely sure. So I was more talking about having cluster specific cache. Not that you can have one single cache for all clusters. Now more I write it I realize that might already be happening. So ignore my ramblings
c

cold-branch-54016

04/06/2023, 12:50 PM
Ok, no worries, thanks for trying to help 👍
r

refined-addition-53644

04/06/2023, 12:51 PM
I also just realized that these pods would be ephemeral in nature. So it’s not always same machine being used even inside k8s
c

cold-branch-54016

04/06/2023, 12:52 PM
I guess my question ultimately boils down to what the best approach is to pass dynamic information, that does not influence the test outcome, into the hermetic environments that pants creates for the tests.
r

refined-addition-53644

04/06/2023, 12:56 PM
My k8s knowledge is limited but can you mount a volume which will store pants cache and then use them across clusters? Something like the caching strategy they suggest in pants ci https://www.pantsbuild.org/docs/using-pants-in-ci#configuring-pants-for-ci-pantscitoml-optional
c

cold-branch-54016

04/06/2023, 1:00 PM
Ahh, the test suite is not run on Kubernetes directly. We run our tests on github. But the individual tests need to connect to a kubernetes cluster to do stuff. That cluster is selected dynamically when the pipeline starts and we need a way to tell the tests to which cluster they should connect to.
👍 1
a

ambitious-actor-36781

04/10/2023, 10:52 PM
I have a similar problem. Except in my case, it’s a rotated/personal credential.
I think there’s a ticket somewhere about “ignoring” things when creating hashes for cache keys.