even with the above PR, if you run `./pants repl`,...
# development
h
even with the above PR, if you run
./pants repl
, then hit ctrl-c, some kind of python process will remain around and screwing up the terminal
specifically it's
/home/gregs/code/pants/build-support/virtualenvs/Linux/pants_dev_deps.py38.venv/bin/python3 /home/gregs/code/pants/.pants.d/tmp0rx6pokk/requirements.pex
, which is I think the actual process that runs when pants creates a python repl. I'm not sure why this is getting killed when the pytest process is getting killed (with the above commit)
anyway like I said earlier I am getting rather frustrated with dealing with the signal handling code between the no-pantsd and pantsd cases, and I would like to do what I can to get rid of that code. I don't think https://github.com/pantsbuild/pants/issues/10061 is necessarily related to this
that is to say, if I understand correctly, porting the nailgun client to rust is orthogonal to the problem of replacing signal communication with the nailgun protocol heartbeat system that https://github.com/pantsbuild/pants/pull/10320 mentions
👍 1
w
correct
sorry, i think i said i was going to sketch out the heartbeat stuff a bit better on the ticket.
will follow up to do that.
h
awesome
❤️ 1
yeah I think implementing that heartbeat system would do a lot to make reasoning about how signaling happens easier
❤️ 1
h
That sounds like a great investment. Thanks for all your improvements on this, Greg. And to @aloof-angle-91616 for the original implementation taking us so far! This is tricky stuff
w
maybe. the way i would suggest thinking about it is: a signal arriving at the server currently sets a signal to say that any ongoing work should exit… in the future, a heartbeat arriving at the server would set a per-session signal that everything in that session should exit
signals are definitely fidgety and limited though
a
i definitely think that sending a SIGKILL to the PGRP of the pants process (so, negative of its own PID) right before it exits would solve some of these issues with leftover processes without a lot of engineering effort
👀 1
w
h
@aloof-angle-91616 at some point today I tried sending signals to the PGRP and it had unexpected bad behavior and I forget exactly what that behavior was
w
… iff the right behavior is always to kill the whole tree, which i think it might not be in the case of
run
and
repl
h
so I abandoned that plan
a
oh cool i’ll try it out i haven’t yet
h
I think what might've happened is that sending a signal to the PGRP kills everything including the pantsd process itself, and that meant that there was cleanup that was supposed to happen but didn't
👍 1
a
oh yes it would have to be at the very very end
h
but I didn't investigate it much, I just moved on and used the
children
thing
psutil
gives you
a
the pgrp seems less prone to race conditions is all
w
it sounds like the challenge is that you don’t want to kill pantsd in the repl case. that’s going to be challenging even/especially with the heartbeat strategy.
👍 1
a
yes that would be much more difficult
h
I think we're killing pantsd more than we need to. ideally you should be able to kill the client and have pantsd continue to run without a problem, right?
w
that’s going to be challenging even/especially with the heartbeat strategy.
to clarify: it’s easy to not kill pantsd with the heartbeat strategy. but it’s challenging for the server to tear down processes cleanly
@hundreds-breakfast-49010: no
that is one of the key reasons for switching to heartbeats.
a
i will review the heartbeat strategy
w
@hundreds-breakfast-49010: to be clear, with heartbeats
pantsd
should stay up, but the run associated with the client should definitely die
let me dump the explanation right now.
👍 2
a
it was perhaps easier to conceptualize when the daemon forked to form the pantsd-runner
w
yea.
i mean, it’s still easy to conceptualize (i think) if you’re ok with killing everything, which was the strategy until recently
a
yes which would be the only justification for the PGRP-upon-death
w
ok, posted the sketch of the heartbeat stuff on https://github.com/pantsbuild/pants/issues/10057, but typing that up actually made me think of something else, so maybe hold off on even reading it:
@hundreds-breakfast-49010: posting that reminded me of how we handled cancelation of nodes, which was to add a drop guard. it feels like having a drop guard for the InteractiveProcess that kills the child might help… or perhaps only be necessary in the future to implement the heartbeat strategy
👍 2
h
I'll keep that strategy in mind
w
i guess that IP is not currently async though… would have to make it async to allow for interrupting via drop
a
that would be pretty neat
left a comment on possible extensions to the heartbeat strategy by reading the raw tty and/or opening up a pty if the simpler version still has issues with
InteractiveProcess
and
^C
. i hope we don't have to use either of them https://github.com/pantsbuild/pants/issues/10057#issuecomment-698845410