there is a `pantsd` crash on the Linux-ARM64 shard...
# development
w
there is a
pantsd
crash on the Linux-ARM64 shards on
main
, but i am completely out of steam for investigating it today. maybe Monday.
1
… as one last diagnostic effort, i am going to restart a bunch of Linux-ARM64 shards on
main
that were killed by other issues that Benjy and i investigated, in hopes that that will allow for identifying when this issue began.
h
goddamn computers
I poked at this a bit - running
./pants --pantsd-max-memory-usage=100MiB
passes in an SSH session, while
./pants
fails with the same error as the CI shard.
So possibly the oom killer is overreacting
I haven't been able to track down its logs yet
But according to the pants logs, pantsd is receiving SIGTERM, but I believe the oomkiller sends SIGKILL
So maybe it's something else
Hmm, and sometimes it does pass without lowering
--pantsd-max-memory-usage
So I don't know what is going on
Going to bed now
will poke some more tomorrow
Hmm no, the SIGTERM was unrelated (from killing pantsd manually I guess), there is nothing useful in the pants log.
w
The sigbus hypothesis was from an exception file in .pants.d on the worker.
h
Now in a terminal on the box things seem to run OK (last night they were more consistently failing), but this gets logged:
[WARN] Executor shutdown took unexpectedly long: tasks were likely leaked!
FWIW
Things still consistently fail in CI though
Yeah, seeing some "Bus error" in the exceptions from last night
So that is a strong hypothesis
The machine obviously has plenty of available RAM, so it's not that
And we've rebooted, right?
Going to try nuking ~/.cache/pants for the gha user and reboot again 🤷‍♂️
heh, that seems to have worked
At least for now
w
thanks a lot for looking at that.