I m seeing consistent deterministic failures on CI <https gi Pants #general

I’m seeing consistent deterministic failures on CI...

red-balloon-89377

06/25/2019, 1:50 PM

I’m seeing consistent deterministic failures on CI: https://gist.github.com/blorente/d2d37b5432ffa8c355f1d6577cd2510f Seem unrelated to my change (https://github.com/pantsbuild/pants/pull/7924), because they also appear in other PRs: https://travis-ci.org/pantsbuild/pants/jobs/549971191 Are these okay to skip? (currently trying to repro with docker)

👍 1

average-vr-56795

06/25/2019, 1:52 PM

That doesn’t sound fun :S I’d be curious to know if the problem here is that we have a bad download of coursier and haven’t verified a checksum of the download or something…

red-balloon-89377

06/25/2019, 1:53 PM

Opened https://github.com/pantsbuild/pants/issues/7952 to track

red-balloon-89377

06/25/2019, 1:54 PM

I’d also be curious, but it’s deterministic enough that I don’t think it’s a network flake, and I don’t see anything in recent history that modified coursier. Will look again while docker is running

red-balloon-89377

06/25/2019, 2:17 PM

No luck with docker 😞

average-vr-56795

06/25/2019, 2:17 PM

I’m still betting on this being network flakiness we’re not detecting…

red-balloon-89377

06/25/2019, 2:18 PM

It’s deterministic in master 😞 I can believe it’s network wrongness (downloading the wrong thing), but we haven’t seen it succeed ever after the commit mentioned in the issue

average-vr-56795

06/25/2019, 2:18 PM

Did you try blowing away ~/.cache and re-running a shard?

average-vr-56795

06/25/2019, 2:19 PM

If we cached the bad download, it would explain why it’s now deterministic…

red-balloon-89377

06/25/2019, 2:19 PM

Where is

? I thought these were all fresh docker images on travis

average-vr-56795

06/25/2019, 2:20 PM

I’m pretty sure we cache a bunch of things… Have a look at how we run the docker image, and any volumes we mount in?

red-balloon-89377

06/25/2019, 2:20 PM

Cool 🙂

red-balloon-89377

06/25/2019, 2:26 PM

Interesting, according to travis the newest cache is from 12 days ago, but this issue started happening 2 days ago

red-balloon-89377

06/25/2019, 2:59 PM

We definitely do cache coursier, I’m going to try to get the file that we have cached

hundreds-father-404

06/25/2019, 3:26 PM

@red-balloon-89377 do you know how to wipe caches? You can either use the web UI or the CLI. The CLI works much much better (UI doesn't surface everything)

red-balloon-89377

06/25/2019, 3:33 PM

I was messing around with the CLI to see if I could figure out which one is the faulty one, but it doesn't look trivial to say "which cache belongs to this PR"

red-balloon-89377

06/25/2019, 3:34 PM

once logged in, I think it’s

travis  cache --delete

red-balloon-89377

06/25/2019, 3:37 PM

Nvm, got it

👍 1

red-balloon-89377

06/25/2019, 4:55 PM

Well… it wasn’t that.

red-balloon-89377

06/25/2019, 4:55 PM

Will blacklist the test and reference the issue

❤️ 1

red-balloon-89377

06/25/2019, 5:11 PM

I’m not sure this is the right thing to do, so would love as many comments as possible: https://github.com/pantsbuild/pants/pull/7953

witty-crayon-22786

06/25/2019, 6:49 PM

cc @wide-energy-11069: ^

wide-energy-11069

06/25/2019, 6:51 PM

yep ran into that too. can’t repro locally either

happy-kitchen-89482

06/25/2019, 7:11 PM

Hitting the same issue.

happy-kitchen-89482

06/25/2019, 9:20 PM

Looks like your change exposed a whole can of worms with similar, but not identical, breakages.

happy-kitchen-89482

06/25/2019, 9:21 PM

I'll restart those shards, just in case that helps, but we have a real problem here.

wide-energy-11069

06/25/2019, 9:40 PM

pushed a commit in the PR to print sha of coursier jar

👍 1

wide-energy-11069

06/25/2019, 9:41 PM

for context, is pantsd enabled for these tests?

hundreds-father-404

06/25/2019, 9:42 PM

In general, no. The parent process itself only uses pantsd during the cron job. Some integration tests use pantsd via @ensure_pantsd or directly invoking it, though.

👌 1

witty-crayon-22786

06/27/2019, 8:14 PM

so, master continues to sortof apocalyptically broken. it looks like there has been some action on https://travis-ci.community/t/continuous-maven-repo-403/3908/10 (linked from https://github.com/pantsbuild/pants/pull/7953). i'm going to see what happens with a cache purge and retry of a recent master.

👍 1

witty-crayon-22786

06/27/2019, 8:24 PM

commented on 7953.

hundreds-father-404

06/27/2019, 9:20 PM

so far all green! just one shard left

witty-crayon-22786

06/27/2019, 9:38 PM

one known flake (the rsc basic binary) but other than that looks green. pheew.

witty-crayon-22786

06/27/2019, 9:38 PM

Will bump that one.

hundreds-father-404

06/27/2019, 9:40 PM

Yay! Thanks to @wide-energy-11069 and @witty-crayon-22786 for working with Travis to fix this. Stu, should we wipe all caches?

witty-crayon-22786

06/27/2019, 9:41 PM

I don't actually know whether the cache wiping was relevant. But sure. Can whack a few more moles.

👍 1

hundreds-father-404

06/27/2019, 9:42 PM

Okay purging now

3 Views

Open in Slack

Previous Next