bitter-ability-32190
07/06/2022, 6:20 PMSIGINT
).
Fatal Python error: This thread state must be current when releasing
Python runtime state: finalizing (tstate=0x2a10660)
The gist is looks like from the thread traces that the main Python thread has finalized while the thread still executing our rule code is running. Looks like a race condition, where normally our rule threads finish before Python finalizes.
đ§”bitter-ability-32190
07/06/2022, 6:22 PMhundreds-father-404
07/06/2022, 6:24 PMbitter-ability-32190
07/06/2022, 6:25 PMbitter-ability-32190
07/06/2022, 6:25 PMsrc/rust/engine/src/session.rs
for exception handlingbitter-ability-32190
07/06/2022, 6:37 PMwitty-crayon-22786
07/06/2022, 6:41 PMscheduler_shutdown
somewhere (in LocalPantsRunner?) and then seeing whether it hangswitty-crayon-22786
07/06/2022, 6:42 PMScheduler
is `Drop`d when all the python references to it have been dropped. which should happen as a natural consequence of the process exiting, afaik⊠(i.e. while tearing down, the python refcount of the PyScheduler
type should go to zero)witty-crayon-22786
07/06/2022, 6:43 PMbitter-ability-32190
07/06/2022, 6:44 PMwitty-crayon-22786
07/06/2022, 6:44 PMscheduler_shutdown
has a slightly different implementation https://github.com/pantsbuild/pants/blob/104f3e2e38aae724e8c7e570b3bf8a0874ecad7a/src/rust/engine/src/externs/interface.rs#L1028-L1038witty-crayon-22786
07/06/2022, 6:45 PMDrop
is implicit in rust: it happens when something goes out of scope. so if the PyScheduler
(which is refcounted by Python) is gcâd, its contents are Drop
dwitty-crayon-22786
07/06/2022, 6:46 PMbitter-ability-32190
07/06/2022, 6:47 PMstatic ref GLOBAL_EXECUTOR: ArcSwapOption<Runtime> = ArcSwapOption::from_pointee(None);
Is the executor not global?witty-crayon-22786
07/06/2022, 6:47 PMExecutor
to shut down, because if it doesnât interact with Python, it should be harmlessbitter-ability-32190
07/06/2022, 6:47 PMwitty-crayon-22786
07/06/2022, 6:47 PMwitty-crayon-22786
07/06/2022, 6:48 PMwitty-crayon-22786
07/06/2022, 6:49 PMwitty-crayon-22786
07/06/2022, 6:50 PMGraph::get
, which uses tokio::spawn
to put async work on the Executorbitter-ability-32190
07/06/2022, 6:50 PMbitter-ability-32190
07/06/2022, 6:57 PMbitter-ability-32190
07/06/2022, 7:03 PM--no-pantsd
witty-crayon-22786
07/06/2022, 7:04 PMwitty-crayon-22786
07/06/2022, 7:20 PMwitty-crayon-22786
07/06/2022, 7:20 PMbitter-ability-32190
07/06/2022, 7:44 PMSchedular.shutdown
is ever called anywhere? Therefore we aren't calling scheduler_shutdown
witty-crayon-22786
07/06/2022, 7:46 PMwitty-crayon-22786
07/06/2022, 7:46 PMPyScheduler
isnât actually getting gcâd (and thus Drop
d).witty-crayon-22786
07/06/2022, 7:47 PMbitter-ability-32190
07/06/2022, 7:49 PMwitty-crayon-22786
07/06/2022, 7:52 PMwitty-crayon-22786
07/06/2022, 7:52 PMwitty-crayon-22786
07/06/2022, 7:53 PMpantsd
(which passes in a live scheduler which is used across runs)bitter-ability-32190
07/06/2022, 7:53 PMPyScheduler
object, which owns the Scheduler
which should then be `drop`ped?witty-crayon-22786
07/06/2022, 7:53 PMbitter-ability-32190
07/06/2022, 7:54 PMgdb
we could likely expect the reference count for the PyScheduler
in this scenariobitter-ability-32190
07/06/2022, 7:55 PMbitter-ability-32190
07/06/2022, 7:55 PMlocal_pants_runner
strawman might "just be it"witty-crayon-22786
07/06/2022, 7:57 PMSo theyea. see above though⊠semi-misleadingly, local pants runner is still used withstrawman might âjust be itâlocal_pants_runner
pantsd
: just with different constructor arguments. so itâs borrowing the scheduler in that case and shouldnât kill itbitter-ability-32190
07/06/2022, 7:58 PMbitter-ability-32190
07/06/2022, 8:01 PMbitter-ability-32190
07/06/2022, 8:02 PMself.graph_session.scheduler_session.scheduler.shutdown()
in the finally
blockwitty-crayon-22786
07/06/2022, 8:03 PMwitty-crayon-22786
07/06/2022, 8:03 PMbitter-ability-32190
07/06/2022, 8:03 PM15:00:23.01 [INFO] Waiting for shutdown of: ["pants_run_2022_07_06_15_00_14_671_07ddc7afbc9548ed81f58f10a10a88a2", "streaming_workunit_handler_session"]
15:00:23.01 [INFO] Shutdown completed: "pants_run_2022_07_06_15_00_14_671_07ddc7afbc9548ed81f58f10a10a88a2"
witty-crayon-22786
07/06/2022, 8:03 PMfinally
is inside of the with
block for streaming_workunit_handler_session
witty-crayon-22786
07/06/2022, 8:03 PMwith
blockbitter-ability-32190
07/06/2022, 8:04 PMbitter-ability-32190
07/06/2022, 8:08 PM__exit__
witty-crayon-22786
07/06/2022, 8:11 PMwitty-crayon-22786
07/06/2022, 8:12 PMwitty-crayon-22786
07/06/2022, 8:12 PMbitter-ability-32190
07/06/2022, 8:17 PMbitter-ability-32190
07/06/2022, 8:21 PMwitty-crayon-22786
07/06/2022, 8:24 PMself.graph_session
field)bitter-ability-32190
07/06/2022, 8:25 PMsrc/python/pants/base/exception_sink.py
)bitter-ability-32190
07/06/2022, 8:28 PMSIGINT
and raising the KeyboardInterrupt
. But I think this info doesn't help us herewitty-crayon-22786
07/06/2022, 8:29 PMbitter-ability-32190
07/06/2022, 8:42 PMprint(sys.getrefcount(self))
print(sys.getrefcount(self.graph_session))
print(sys.getrefcount(self.graph_session.scheduler_session))
print(sys.getrefcount(self.graph_session.scheduler_session.scheduler))
print(sys.getrefcount(self.graph_session.scheduler_session.scheduler.py_scheduler))
gives me:
5
2
4
3
2
bitter-ability-32190
07/06/2022, 8:42 PMbitter-ability-32190
07/06/2022, 8:45 PMbitter-ability-32190
07/06/2022, 8:46 PMScheduler
(in Python) del self._py_scheduler
does then drop
the Scheduler
(in Rust).
But also reproduces the errorbitter-ability-32190
07/06/2022, 8:47 PMbitter-ability-32190
07/06/2022, 8:48 PMbitter-ability-32190
07/06/2022, 9:09 PMbitter-ability-32190
07/06/2022, 9:09 PMwitty-crayon-22786
07/06/2022, 9:21 PMwitty-crayon-22786
07/06/2022, 9:21 PMwitty-crayon-22786
07/06/2022, 9:22 PMmain
has exitedwitty-crayon-22786
07/06/2022, 9:22 PMbitter-ability-32190
07/06/2022, 9:45 PMbitter-ability-32190
07/06/2022, 9:50 PMwitty-crayon-22786
07/06/2022, 9:51 PMDrop
when a process exits ⊠it almost certainly depends âhowâ the process exitswitty-crayon-22786
07/06/2022, 9:52 PMsys.exit
vs returning from the entrypointbitter-ability-32190
07/06/2022, 10:01 PMbitter-ability-32190
07/06/2022, 10:09 PMbitter-ability-32190
07/06/2022, 10:12 PMbitter-ability-32190
07/07/2022, 12:29 AMsys.exit
is it. Oddly, the code kinda looks like it's calling PyFinalize
int hat codepath and not the otherbitter-ability-32190
07/07/2022, 12:35 AMPyErr_Print
if the return value is NULL
I suspect that is in the case of an exception (sys.exit
just raises SystemExit
)
... eventually in handle_system_exit
we call Py_Exit
which calls `Py_FinalizeEx`:https://github.com/python/cpython/blob/bec802dbb87717a23acb9c600c15f40bc98340a3/Python/pylifecycle.c#L2922bitter-ability-32190
07/07/2022, 12:39 AMos._exit(...)
bitter-ability-32190
07/07/2022, 12:46 AMfinalize
in the happy path too đ
So not sure why we don't see this otherwise, other than "race condition"bitter-ability-32190
07/07/2022, 12:52 AMbitter-ability-32190
07/07/2022, 12:53 AMbitter-ability-32190
07/07/2022, 1:01 AMSIGINT
is the key here.bitter-ability-32190
07/07/2022, 11:15 AMwitty-crayon-22786
07/07/2022, 4:32 PMbitter-ability-32190
07/07/2022, 4:32 PMawait
ting the cancellation wasn't the secret sauce either.bitter-ability-32190
07/07/2022, 4:33 PMwitty-crayon-22786
07/07/2022, 4:33 PMPySessionCancellationLatch
which is used to cancel Sessions
in general may not be what gets triggered when a SIGINT comes inwitty-crayon-22786
07/07/2022, 4:35 PMwitty-crayon-22786
07/07/2022, 4:36 PMI set up a meetingyea, sounds good. see you there
bitter-ability-32190
07/07/2022, 4:37 PMasync
rule running which was running process is attempting to cleanup and taking a very long timewitty-crayon-22786
07/07/2022, 4:49 PMwitty-crayon-22786
07/07/2022, 4:50 PMGraph
is `spawn`d onto the Executor⊠i.e., itâs running async in the background. when clients go away, it is cancelled as soon as possible, but not synchronously. so it might take some time for things to wrap up.witty-crayon-22786
07/07/2022, 4:51 PMawait
before it can be canceled)witty-crayon-22786
07/07/2022, 4:53 PMbitter-ability-32190
07/07/2022, 4:54 PMbitter-ability-32190
07/07/2022, 4:54 PMwitty-crayon-22786
07/07/2022, 4:54 PMbitter-ability-32190
07/07/2022, 5:00 PMwitty-crayon-22786
07/07/2022, 5:05 PMwitty-crayon-22786
07/07/2022, 5:07 PMwitty-crayon-22786
07/07/2022, 5:07 PMbitter-ability-32190
07/07/2022, 5:07 PMbitter-ability-32190
07/07/2022, 5:08 PMwitty-crayon-22786
07/07/2022, 5:08 PMwitty-crayon-22786
07/07/2022, 5:08 PMbitter-ability-32190
07/07/2022, 5:09 PMPyO3
or PyOxy
? đwitty-crayon-22786
07/07/2022, 5:10 PMwitty-crayon-22786
07/07/2022, 5:10 PMbitter-ability-32190
07/07/2022, 5:10 PMpyoxy
providing Python, but still using pyo3
for the FFIingwitty-crayon-22786
07/07/2022, 5:11 PMwitty-crayon-22786
07/07/2022, 5:11 PMbitter-ability-32190
07/07/2022, 5:11 PMwitty-crayon-22786
07/07/2022, 5:12 PMpantsd
. that could be shipped first, since the pants
script could get a hold of the native client binary and invoke that rather than python
witty-crayon-22786
07/07/2022, 5:13 PMpants
script to get it increases complexity that we want to be removing insteadbitter-ability-32190
07/07/2022, 5:14 PMdebugpy
shenanigans will need to be shifted. HMU when that needs to be done, I dont wanna bog John down with that