<@U01PZK60W2F>: continuing debugging of the hung t...
# development
w
@curved-television-6568: continuing debugging of the hung test:
it looks like the test is hung about where we would expect: inside the scheduler.
c
OK, we can keep the convo here, I’ll paste lengthy non-redacted stuff in DM
w
to get more information, adding the
@logging(level=LogLevel.DEBUG)
decorator to the test and adjusting how you are running
pytest
to not capture stdio might get you more information in the foreground
(thanks again by the way!)
👌 1
c
np. is it interesting to know the open files (there were a ton)
lsof -p <pid>
? 🙂 (I’ll run with the above)
w
um, unclear so far… if it looks like we’re waiting on the network, then it might be?
which reminds me: can you look for child processes of the test process? particularly any python processes?
c
yeah, there were none
w
iiinteresting.
c
however, now it failed rather quickly, no hang…
I’ll see if I can revert back
w
at least in CI, this is flakey, so it might only hang in some cases?
c
it was stdio related, when I tried with the pytest option
--capture=tee-sys
now it hangs, but I’m not sure how to get more output during pytest execution runs…
w
yikes.
um, can you try
lldb -p $test_process_pid
and then
bt all
when it attaches?
👍 1
c
tried
-s
but it doesn’t give me any… err, that was yesterday with debug logging on the test it does…
this is the output before it hangs:
Copy code
src/python/pants/backend/python/util_rules/pex_test.py::test_lockfiles 15:55:02.07 [INFO] external invalidation: cleared 0 and dirtied 0 nodes for: {"pex_lock.json", ""}
15:55:02.08 [INFO] external invalidation: cleared 0 and dirtied 0 nodes for: {"", "reqs_lock.txt"}
15:55:02.08 [DEBUG] Launching 1 roots (poll=false).
15:55:02.08 [DEBUG] Starting: pants.backend.python.util_rules.pex.build_pex
ok, wow, yeah bt all gave a lot. you want it all?
w
yes please
c
got it
w
i believe that the backtrace points to the issue. although i still don’t fully understand it.
basically: one of the threads i see is stuck tearing down the docker CommandRunner… which @fast-nail-55400 just merged a patch to adjust: https://github.com/pantsbuild/pants/pull/16930
c
cool. I’ve read a fair share of backtraces, but without a link map and good knowledge of the underlying code they rarely give me much 😛
w
and staring at the backtrace a bit more gave me another chance to better understand it, so i’ll post on that ticket for posterity.
👍 1
c
which @fast-nail-55400 just merged a patch to adjust
sounds interesting. too close to be coincidence?
w
sorry, to be clear: his patch is intended to fix this
c
oh, ok
I’m running off of 86295c1015ae1530b39f7f0af61ec87ebaf139d8 btw…
could try again with tom’s fix in..
w
yea, that would be good.
c
but why was the docker CommandRunner used at all now?
w
yea, that would be good.
@curved-television-6568: but you could also wait and see whether this test continues flaking… i’m optimistic it won’t
c
@fast-nail-55400 🙌 profit! (i.e. it works now)
w
but why was the docker CommandRunner used at all now?
it will be in the stack, but it should not actually be used in this case.
c
ah ’right.
w
but yea, that raises a question of what exactly it is hung doing. i commented on https://github.com/pantsbuild/pants/pull/16930
👍 1
thanks a lot for investigating!
❤️ 1
👍 1
c
my pleasure
f
https://github.com/pantsbuild/pants/pull/16951 should fix looking for Docker even though Docker was never used.
💯 1
w
thank you!