even with the above PR if you run ` pants repl` then hit ctr Pants #development

even with the above PR, if you run `./pants repl`,...

hundreds-breakfast-49010

09/23/2020, 12:17 AM

even with the above PR, if you run

./pants repl

, then hit ctrl-c, some kind of python process will remain around and screwing up the terminal

hundreds-breakfast-49010

09/23/2020, 12:18 AM

specifically it's

/home/gregs/code/pants/build-support/virtualenvs/Linux/pants_dev_deps.py38.venv/bin/python3 /home/gregs/code/pants/.pants.d/tmp0rx6pokk/requirements.pex

, which is I think the actual process that runs when pants creates a python repl. I'm not sure why this is getting killed when the pytest process is getting killed (with the above commit)

hundreds-breakfast-49010

09/23/2020, 12:22 AM

anyway like I said earlier I am getting rather frustrated with dealing with the signal handling code between the no-pantsd and pantsd cases, and I would like to do what I can to get rid of that code. I don't think https://github.com/pantsbuild/pants/issues/10061 is necessarily related to this

hundreds-breakfast-49010

09/23/2020, 12:25 AM

that is to say, if I understand correctly, porting the nailgun client to rust is orthogonal to the problem of replacing signal communication with the nailgun protocol heartbeat system that https://github.com/pantsbuild/pants/pull/10320 mentions

👍 1

witty-crayon-22786

09/23/2020, 12:31 AM

correct

witty-crayon-22786

09/23/2020, 12:32 AM

sorry, i think i said i was going to sketch out the heartbeat stuff a bit better on the ticket.

witty-crayon-22786

09/23/2020, 12:32 AM

will follow up to do that.

hundreds-breakfast-49010

09/23/2020, 12:33 AM

awesome

❤️ 1

hundreds-breakfast-49010

09/23/2020, 12:35 AM

yeah I think implementing that heartbeat system would do a lot to make reasoning about how signaling happens easier

❤️ 1

hundreds-father-404

09/23/2020, 12:45 AM

That sounds like a great investment. Thanks for all your improvements on this, Greg. And to @aloof-angle-91616 for the original implementation taking us so far! This is tricky stuff

witty-crayon-22786

09/23/2020, 12:46 AM

maybe. the way i would suggest thinking about it is: a signal arriving at the server currently sets a signal to say that any ongoing work should exit… in the future, a heartbeat arriving at the server would set a per-session signal that everything in that session should exit

witty-crayon-22786

09/23/2020, 12:46 AM

signals are definitely fidgety and limited though

aloof-angle-91616

09/23/2020, 12:47 AM

i definitely think that sending a SIGKILL to the PGRP of the pants process (so, negative of its own PID) right before it exits would solve some of these issues with leftover processes without a lot of engineering effort

👀 1

witty-crayon-22786

09/23/2020, 12:52 AM

yea. that’s worth looking at @hundreds-breakfast-49010. https://stackoverflow.com/questions/392022/whats-the-best-way-to-send-a-signal-to-all-members-of-a-process-group

hundreds-breakfast-49010

09/23/2020, 12:53 AM

@aloof-angle-91616 at some point today I tried sending signals to the PGRP and it had unexpected bad behavior and I forget exactly what that behavior was

witty-crayon-22786

09/23/2020, 12:53 AM

… iff the right behavior is always to kill the whole tree, which i think it might not be in the case of

run

and

repl

…

hundreds-breakfast-49010

09/23/2020, 12:53 AM

so I abandoned that plan

aloof-angle-91616

09/23/2020, 12:53 AM

oh cool i’ll try it out i haven’t yet

hundreds-breakfast-49010

09/23/2020, 12:54 AM

I think what might've happened is that sending a signal to the PGRP kills everything including the pantsd process itself, and that meant that there was cleanup that was supposed to happen but didn't

👍 1

aloof-angle-91616

09/23/2020, 12:55 AM

oh yes it would have to be at the very very end

hundreds-breakfast-49010

09/23/2020, 12:55 AM

but I didn't investigate it much, I just moved on and used the

children

thing

psutil

gives you

aloof-angle-91616

09/23/2020, 12:56 AM

the pgrp seems less prone to race conditions is all

witty-crayon-22786

09/23/2020, 12:58 AM

it sounds like the challenge is that you don’t want to kill pantsd in the repl case. that’s going to be challenging even/especially with the heartbeat strategy.

👍 1

aloof-angle-91616

09/23/2020, 12:59 AM

yes that would be much more difficult

hundreds-breakfast-49010

09/23/2020, 12:59 AM

I think we're killing pantsd more than we need to. ideally you should be able to kill the client and have pantsd continue to run without a problem, right?

witty-crayon-22786

09/23/2020, 1:00 AM

that’s going to be challenging even/especially with the heartbeat strategy.

to clarify: it’s easy to not kill pantsd with the heartbeat strategy. but it’s challenging for the server to tear down processes cleanly

witty-crayon-22786

09/23/2020, 1:00 AM

@hundreds-breakfast-49010: no

witty-crayon-22786

09/23/2020, 1:00 AM

that is one of the key reasons for switching to heartbeats.

aloof-angle-91616

09/23/2020, 1:00 AM

i will review the heartbeat strategy

witty-crayon-22786

09/23/2020, 1:01 AM

@hundreds-breakfast-49010: to be clear, with heartbeats

pantsd

should stay up, but the run associated with the client should definitely die

witty-crayon-22786

09/23/2020, 1:01 AM

let me dump the explanation right now.

👍 2

aloof-angle-91616

09/23/2020, 1:04 AM

it was perhaps easier to conceptualize when the daemon forked to form the pantsd-runner

witty-crayon-22786

09/23/2020, 1:06 AM

yea.

witty-crayon-22786

09/23/2020, 1:06 AM

i mean, it’s still easy to conceptualize (i think) if you’re ok with killing everything, which was the strategy until recently

aloof-angle-91616

09/23/2020, 1:08 AM

yes which would be the only justification for the PGRP-upon-death

witty-crayon-22786

09/23/2020, 1:12 AM

ok, posted the sketch of the heartbeat stuff on https://github.com/pantsbuild/pants/issues/10057, but typing that up actually made me think of something else, so maybe hold off on even reading it:

witty-crayon-22786

09/23/2020, 1:14 AM

@hundreds-breakfast-49010: posting that reminded me of how we handled cancelation of nodes, which was to add a drop guard. it feels like having a drop guard for the InteractiveProcess that kills the child might help… or perhaps only be necessary in the future to implement the heartbeat strategy

👍 2

hundreds-breakfast-49010

09/23/2020, 1:15 AM

I'll keep that strategy in mind

witty-crayon-22786

09/23/2020, 1:15 AM

i guess that IP is not currently async though… would have to make it async to allow for interrupting via drop

aloof-angle-91616

09/23/2020, 1:16 AM

that would be pretty neat

aloof-angle-91616

09/25/2020, 10:13 AM

left a comment on possible extensions to the heartbeat strategy by reading the raw tty and/or opening up a pty if the simpler version still has issues with

InteractiveProcess

and

^C

. i hope we don't have to use either of them https://github.com/pantsbuild/pants/issues/10057#issuecomment-698845410

Open in Slack

Previous Next