I’m seeing consistent deterministic failures on CI...
# general
r
I’m seeing consistent deterministic failures on CI: https://gist.github.com/blorente/d2d37b5432ffa8c355f1d6577cd2510f Seem unrelated to my change (https://github.com/pantsbuild/pants/pull/7924), because they also appear in other PRs: https://travis-ci.org/pantsbuild/pants/jobs/549971191 Are these okay to skip? (currently trying to repro with docker)
👍 1
a
That doesn’t sound fun :S I’d be curious to know if the problem here is that we have a bad download of coursier and haven’t verified a checksum of the download or something…
r
I’d also be curious, but it’s deterministic enough that I don’t think it’s a network flake, and I don’t see anything in recent history that modified coursier. Will look again while docker is running
No luck with docker 😞
a
I’m still betting on this being network flakiness we’re not detecting…
r
It’s deterministic in master 😞 I can believe it’s network wrongness (downloading the wrong thing), but we haven’t seen it succeed ever after the commit mentioned in the issue
a
Did you try blowing away ~/.cache and re-running a shard?
If we cached the bad download, it would explain why it’s now deterministic…
r
Where is
~
? I thought these were all fresh docker images on travis
a
I’m pretty sure we cache a bunch of things… Have a look at how we run the docker image, and any volumes we mount in?
r
Cool 🙂
Interesting, according to travis the newest cache is from 12 days ago, but this issue started happening 2 days ago
We definitely do cache coursier, I’m going to try to get the file that we have cached
h
@red-balloon-89377 do you know how to wipe caches? You can either use the web UI or the CLI. The CLI works much much better (UI doesn't surface everything)
r
I was messing around with the CLI to see if I could figure out which one is the faulty one, but it doesn't look trivial to say "which cache belongs to this PR"
once logged in, I think it’s
travis  cache --delete
Nvm, got it
👍 1
Well… it wasn’t that.
Will blacklist the test and reference the issue
❤️ 1
I’m not sure this is the right thing to do, so would love as many comments as possible: https://github.com/pantsbuild/pants/pull/7953
w
cc @wide-energy-11069: ^
w
yep ran into that too. can’t repro locally either
h
Hitting the same issue.
Looks like your change exposed a whole can of worms with similar, but not identical, breakages.
I'll restart those shards, just in case that helps, but we have a real problem here.
w
pushed a commit in the PR to print sha of coursier jar
👍 1
for context, is pantsd enabled for these tests?
h
In general, no. The parent process itself only uses pantsd during the cron job. Some integration tests use pantsd via @ensure_pantsd or directly invoking it, though.
👌 1
w
so, master continues to sortof apocalyptically broken. it looks like there has been some action on https://travis-ci.community/t/continuous-maven-repo-403/3908/10 (linked from https://github.com/pantsbuild/pants/pull/7953). i'm going to see what happens with a cache purge and retry of a recent master.
👍 1
commented on 7953.
h
so far all green! just one shard left
w
one known flake (the rsc basic binary) but other than that looks green. pheew.
Will bump that one.
h
Yay! Thanks to @wide-energy-11069 and @witty-crayon-22786 for working with Travis to fix this. Stu, should we wipe all caches?
w
I don't actually know whether the cache wiping was relevant. But sure. Can whack a few more moles.
👍 1
h
Okay purging now