I think I've found a deadlock related to <https://...
# development
g
I think I've found a deadlock related to https://github.com/pantsbuild/pants/issues/16969. I'm looking at a completely hung pantsd which, after running
pants test <a path>
just fails to exit. Specifically it seems to happen after running a fairly large number of processes, I guess because higher lock contention increases risk of it happening. I've done an inventory of the threads; and they break down as follows:
Copy code
T1: python code
T2..T5: complete_workunit -> stderr_use_color
T6: python code
T7: complete_workunit -> stderr_use_color
T8: log_from_python -> stderr_use_color
T9..T10: complete_workunit -> stderr_use_color
T11..T14: notify-rs/fs-watcher
T15: parked tokio
T16: python code
T17: maybe_set_panic_handler -> stderr_use_color
T18: maybe_display_render -> ProgressBar::state
The only Python code of note is T6 which originates in
src/nodes.rs
, so likely Rust calling Python. The other two only has Python code on them. All three end up in
PyThread_acquire_lock_timed
either way, while most native threads are trying to take the destination lock to check if we can use color. I can't actually find what's holding the lock to the destination, which is confusing.
❤️ 1
The only logical thing which could hold these two locks (state + destination) would be if we are also running teardown in parallel with the render call which doesn't make sense.
Actually would be a render call after teardown even which I don't think could happen. So that theory seems bust from the start.
Copy code
❯ MODE=debug pants test src/python/pants/backend/python/::
repros it quite heavily on my branch, but that doesn't even activate any of my new code.
Ah, our panic handler uses logging to handle the panic... That seems inadvisable! It means that if a panic occurs in a stack that already has a lock we deadlock. That's a funky one.
Another bug in indicatif so can fix it but the panic handler is still unsound... Will have a think about how to handle it. What is our desired behavior here, really...