Does anyone recognise this ?flaky test? <https://g...
# development
Does anyone recognise this ?flaky test?
Copy code
03:07:13.25 [ERROR] Completed: Run Pytest - src/python/pants/backend/python/lint/flake8/ - failed (exit code -6).
============================= test session starts ==============================
collecting ... collected 2 items

src/python/pants/backend/python/lint/flake8/ PASSED [ 50%]
src/python/pants/backend/python/lint/flake8/ PASSED [100%]

- generated xml file: /tmp/pants-sandbox-jpvlai/ -
============================== 2 passed in 14.23s ==============================

FATAL: exception not rethrown
Also seen on another test
Some googling suggests it's an issue around C++ exceptions and threading which insta-terminates a process
Not sure why it would start happening on though
But I've done ~5 reruns of CI and consistently hit the issue
Just tried reverting the recent rust update on the off chance - nothing else recent seems like it would obviously impact native code, but also, who knows what random python dependencies do
(Waiting on CI for that revert)
This is my biggest suspicion from the rust stuff:
(But it's a stretch, I don't imagine anything is sending signals to our pytest processes...)
Wait, when I pushed a revert commit, CI decided to stop trying to run the python test shards?
squashes commits together so they trigger the superset of CI
Oh, no, I think it's just not triggering yet because the rust bootstrapping was slow 🙂
Also occurred after the revert, guess that wasn't it...
I've been hitting it too 😔
I’ve seen it once or twice as well, but not consistently.
and only on CI
Cc @happy-kitchen-89482 who commented on the PR
Any pointers on how to grab a core dump out of GitHub Actions these days?
None here other then upload files maybe 🤔
I've only ever seen it on flake8 and pylint tests 🤔
That did resolve after a couple of retries
I can happily remove my rust revert and keep retrying til we're green (🤞), but... Seems worth an investigation too...
In recent history this has been from my changes 😥
I'd pretty strongly assume that's an observer bias 🙂
So one theory I have is this is coming from turning off coverage. It might not be related to coverage at all, but stems from time savings from coverage being turned off that creates a perfect storm. One of these issues was similar, where timing was important
It consistently being flake8/pylint tests is like our biggest clue though.
I also recently turned off the daemon when using run_pants. I feel colder about that one, but still not ruled out
Well the release PR is merged at least :)
my best guess is that this is related to … made a bit of progress toward resolving that one today with