https://pantsbuild.org/ logo
b

broad-processor-92400

04/26/2023, 3:15 AM
I notice the pants repo has remote caching set up in
pants.ci.toml
, and it seems to cut test times noticably on
main
, but doesn't help much in a PR. Specifically, retrying a test shard that failed spuriously still ends up rerunning all of the passing tests. On the surface, this circumstance seems very amenable to caching: the code is the same between the runs (I think?). Some of the steps set
PANTS_REMOTE_CACHE_READ=false
etc., but not the test ones. Am I missing something about how remote caching works?
1
e

enough-analyst-54434

04/26/2023, 3:35 AM
Did those shards actually use remote caching? Here's an example of one that did not: https://github.com/pantsbuild/pants/actions/runs/4794253791/jobs/8528270309#step:11:215
Auth failure is a thing now and again FWICT. Maybe more now than again?
So, still a problem but maybe you've pin-pointed the wrong one.
b

broad-processor-92400

04/26/2023, 3:37 AM
This one on main seemed to: https://github.com/pantsbuild/pants/actions/runs/4801515869/jobs/8543869725 Qualitatively, it seems like it works quite often on
main
(most of the ones I click into have some sort of remote caching), but never on PRs.
e

enough-analyst-54434

04/26/2023, 3:38 AM
It definitely works on some PRs, I've certainly hit retry and had shards complete almost instantaneously.
b

broad-processor-92400

04/26/2023, 3:38 AM
Ah, just above that line there's:
Copy code
06:15:29.74 [WARN] [rule-construct-auth-store] Failed to load Toolchain token from env var 'TOOLCHAIN_AUTH_TOKEN'. Please make sure the env var is set in your environment.
Which'd might suggest that's a secret not available to (some) PRs?
e

enough-analyst-54434

04/26/2023, 3:38 AM
So I think you've identified flaky remote cache, not lack of it.
b

broad-processor-92400

04/26/2023, 3:40 AM
Ah, GHA secrets say: https://docs.github.com/en/actions/security-guides/encrypted-secrets#using-encrypted-secrets-in-a-workflow "With the exception of
GITHUB_TOKEN
, secrets are not passed to the runner when a workflow is triggered from a forked repository."
e

enough-analyst-54434

04/26/2023, 3:40 AM
Yeah - classic security hole there.
b

broad-processor-92400

04/26/2023, 3:40 AM
(ah, and that behaviour is presumably why that step behaves like that.)
e

enough-analyst-54434

04/26/2023, 3:41 AM
Well, you could imagine compromises, etc. But you're at the correct meat now. This would require some engineering + vulnerabilty thought, etc. Clearly everyone wants caching all the time.
👍 1
We already compromise on our s3 bucket IIRC so that PRs can push / pull to / from it.
b

broad-processor-92400

04/26/2023, 3:42 AM
Ok, question resolved: requires a secret, even to read, and thus no remote-cache usage in PRs. (e.g. avoid cache poisoning attacks) I guess theoretically one could potentially have a read-only token that's specified as a variable, rather than a secret, and at least have cache-read
e

enough-analyst-54434

04/26/2023, 3:42 AM
Yeah - you could go 18 routes to maybe sortof secure.
👍 1
b

broad-processor-92400

04/26/2023, 3:43 AM
yeah; lots of options and trade-offs. Thanks for walking through it with me
f

fast-nail-55400

04/26/2023, 7:08 AM
Two points: 1. Because of security reasons, auth for remote cache is restricted on forked repos. 2. I recall Toolchain had a ~45 minute expiration on the restricted access token issued for a run, thus if the CI build takes longer than that the token will also be denied access after that. (Could be longer but it was a fixed value.)
p

polite-garden-50641

04/26/2023, 2:33 PM
there is a special mechanism used to obtain an access token for PRs. the Toolchain backend will check who submitted the PR and a few other things. the toolchain backend will issue an access token only to users that are known by the toolchain backend, which in the pants repo's case means that the user is a member for the GH pantsbuild org and has logged into toolchain. Otherwise an access token won't be issued and the remote cache will be disabled for that PR.
👍 1
so for example, if Josh or Andreas (both are members of the pantsbuild GH org and have a toolchain user) submit a PR , the CI jobs will be able to use the remote cache, but if some other user does it, they won't.
e

enough-analyst-54434

04/26/2023, 3:28 PM
Ah, thanks @polite-garden-50641 - I knew my PR re-runs were often fast. I forgot about that allowlisting setup.