so, on the subject of pants logging - there's an E...
# development
h
so, on the subject of pants logging - there's an ExceptionSink method called
reset_interactive_output_stream
that's called as part of the log setup process (in both the pantsd and non-pantsd cases - my high-level goal in looking at this code is to try to simplify pants' general logging initialization)
❤️ 1
it looks like the purpose of this code path is to make it possible to send SIGUSR2 to pants to get an immediate stack trace. I"m not 100% sure where this stack trace will actually show up - one comment suggests that it will wind up in
.pants.d/pantsd/pantsd.log
, presumably iff pantsd is running, but I haven't confirmed this
I'm not sure if this is still useful functionality in a mixed rust-python world
👍 1
and it's also not something we document
although I guess using one of the SIGUSR signals as a way to print a live stack trace is a common-enough unix convention
anyway, if we decide that we don't care about being able to get a stack trace with SIGUSR2, we should delete that code path. and I think that if we do decide we want to be able to do this, we should delete that code path anyway, reimplement it in an engine-aware way, and document it
👍 1
h
Unless we’ve used that mechanism in the past ~3 months, I’d bias towards deleting. We can re-implement it whenever we want and use the original as inspiration It sounds like an internal mechanism we’d care about; I don’t imagine end users make use of this
f
Does it dump all thread stacks? I could make use of it for debugging deadlocks in CI where we may not be able to connect gdb due to kernel permissions.
h
which is using this python std lib: https://docs.python.org/3/library/faulthandler.html
I have no idea what this library does with rust ffi
a
the SIGUSR2 signal was used to track some errors we saw with nailgun and the daemon when the daemon was locked up: https://github.com/pantsbuild/pants/issues/6530
👀 1
i share greg's concern about having too much of this stuff get handled by python at all even if there is a convenient library -- the functionality itself is definitely good, but we may be able to replicate it (just spitballing here) by invoking the cpython api to get a
TracebackException()
👍 1
the tests for most of that functionality are skipped and greg has now made me interested in wanting to see whether we can further move off of that -- it was so difficult and frustrating to try to make those tests work and not a good use of my time (particularly signal handling with faulthandler)
💯 1
i'm gonna make an issue for reproducing the SIGUSR2 stack tracing in rust because why not
btw, i'm not sure whether SIGUSR2 is a unix convention for stack traces, but it happened to be an open signal that we were able to hijack i think (and signals are obviously ridiculously touchy) so will consider like not using signals lol
i don't know whether the priority of a signal handler makes it more likely to be able to respond than something else to a lockup (specifically your example with gdb). it feels like playing with fire
thanks a lot for the thoughtful replies to my comments greg, was super focusing
👍 1