bitter-ability-32190
07/06/2022, 6:20 PMSIGINT).
Fatal Python error: This thread state must be current when releasing
Python runtime state: finalizing (tstate=0x2a10660)
The gist is looks like from the thread traces that the main Python thread has finalized while the thread still executing our rule code is running. Looks like a race condition, where normally our rule threads finish before Python finalizes.
đ§”bitter-ability-32190
07/06/2022, 6:22 PMhundreds-father-404
07/06/2022, 6:24 PMbitter-ability-32190
07/06/2022, 6:25 PMbitter-ability-32190
07/06/2022, 6:25 PMsrc/rust/engine/src/session.rs for exception handlingbitter-ability-32190
07/06/2022, 6:37 PMwitty-crayon-22786
07/06/2022, 6:41 PMscheduler_shutdown somewhere (in LocalPantsRunner?) and then seeing whether it hangswitty-crayon-22786
07/06/2022, 6:42 PMScheduler is `Drop`d when all the python references to it have been dropped. which should happen as a natural consequence of the process exiting, afaik⊠(i.e. while tearing down, the python refcount of the PyScheduler type should go to zero)witty-crayon-22786
07/06/2022, 6:43 PMbitter-ability-32190
07/06/2022, 6:44 PMwitty-crayon-22786
07/06/2022, 6:44 PMscheduler_shutdown has a slightly different implementation https://github.com/pantsbuild/pants/blob/104f3e2e38aae724e8c7e570b3bf8a0874ecad7a/src/rust/engine/src/externs/interface.rs#L1028-L1038witty-crayon-22786
07/06/2022, 6:45 PMDrop is implicit in rust: it happens when something goes out of scope. so if the PyScheduler (which is refcounted by Python) is gcâd, its contents are Drop dwitty-crayon-22786
07/06/2022, 6:46 PMbitter-ability-32190
07/06/2022, 6:47 PMstatic ref GLOBAL_EXECUTOR: ArcSwapOption<Runtime> = ArcSwapOption::from_pointee(None);
Is the executor not global?witty-crayon-22786
07/06/2022, 6:47 PMExecutor to shut down, because if it doesnât interact with Python, it should be harmlessbitter-ability-32190
07/06/2022, 6:47 PMwitty-crayon-22786
07/06/2022, 6:47 PMwitty-crayon-22786
07/06/2022, 6:48 PMwitty-crayon-22786
07/06/2022, 6:49 PMwitty-crayon-22786
07/06/2022, 6:50 PMGraph::get, which uses tokio::spawn to put async work on the Executorbitter-ability-32190
07/06/2022, 6:50 PMbitter-ability-32190
07/06/2022, 6:57 PMbitter-ability-32190
07/06/2022, 7:03 PM--no-pantsdwitty-crayon-22786
07/06/2022, 7:04 PMwitty-crayon-22786
07/06/2022, 7:20 PMwitty-crayon-22786
07/06/2022, 7:20 PMbitter-ability-32190
07/06/2022, 7:44 PMSchedular.shutdown is ever called anywhere? Therefore we aren't calling scheduler_shutdownwitty-crayon-22786
07/06/2022, 7:46 PMwitty-crayon-22786
07/06/2022, 7:46 PMPyScheduler isnât actually getting gcâd (and thus Drop d).witty-crayon-22786
07/06/2022, 7:47 PMbitter-ability-32190
07/06/2022, 7:49 PMwitty-crayon-22786
07/06/2022, 7:52 PMwitty-crayon-22786
07/06/2022, 7:52 PMwitty-crayon-22786
07/06/2022, 7:53 PMpantsd (which passes in a live scheduler which is used across runs)bitter-ability-32190
07/06/2022, 7:53 PMPyScheduler object, which owns the Scheduler which should then be `drop`ped?witty-crayon-22786
07/06/2022, 7:53 PMbitter-ability-32190
07/06/2022, 7:54 PMgdb we could likely expect the reference count for the PyScheduler in this scenariobitter-ability-32190
07/06/2022, 7:55 PMbitter-ability-32190
07/06/2022, 7:55 PMlocal_pants_runner strawman might "just be it"witty-crayon-22786
07/06/2022, 7:57 PMSo theyea. see above though⊠semi-misleadingly, local pants runner is still used withstrawman might âjust be itâlocal_pants_runner
pantsd : just with different constructor arguments. so itâs borrowing the scheduler in that case and shouldnât kill itbitter-ability-32190
07/06/2022, 7:58 PMbitter-ability-32190
07/06/2022, 8:01 PMbitter-ability-32190
07/06/2022, 8:02 PMself.graph_session.scheduler_session.scheduler.shutdown() in the finally blockwitty-crayon-22786
07/06/2022, 8:03 PMwitty-crayon-22786
07/06/2022, 8:03 PMbitter-ability-32190
07/06/2022, 8:03 PM15:00:23.01 [INFO] Waiting for shutdown of: ["pants_run_2022_07_06_15_00_14_671_07ddc7afbc9548ed81f58f10a10a88a2", "streaming_workunit_handler_session"]
15:00:23.01 [INFO] Shutdown completed: "pants_run_2022_07_06_15_00_14_671_07ddc7afbc9548ed81f58f10a10a88a2"witty-crayon-22786
07/06/2022, 8:03 PMfinally is inside of the with block for streaming_workunit_handler_sessionwitty-crayon-22786
07/06/2022, 8:03 PMwith blockbitter-ability-32190
07/06/2022, 8:04 PMbitter-ability-32190
07/06/2022, 8:08 PM__exit__witty-crayon-22786
07/06/2022, 8:11 PMwitty-crayon-22786
07/06/2022, 8:12 PMwitty-crayon-22786
07/06/2022, 8:12 PMbitter-ability-32190
07/06/2022, 8:17 PMbitter-ability-32190
07/06/2022, 8:21 PMwitty-crayon-22786
07/06/2022, 8:24 PMself.graph_session field)bitter-ability-32190
07/06/2022, 8:25 PMsrc/python/pants/base/exception_sink.py)bitter-ability-32190
07/06/2022, 8:28 PMSIGINT and raising the KeyboardInterrupt. But I think this info doesn't help us herewitty-crayon-22786
07/06/2022, 8:29 PMbitter-ability-32190
07/06/2022, 8:42 PMprint(sys.getrefcount(self))
print(sys.getrefcount(self.graph_session))
print(sys.getrefcount(self.graph_session.scheduler_session))
print(sys.getrefcount(self.graph_session.scheduler_session.scheduler))
print(sys.getrefcount(self.graph_session.scheduler_session.scheduler.py_scheduler))
gives me:
5
2
4
3
2bitter-ability-32190
07/06/2022, 8:42 PMbitter-ability-32190
07/06/2022, 8:45 PMbitter-ability-32190
07/06/2022, 8:46 PMScheduler (in Python) del self._py_scheduler does then drop the Scheduler (in Rust).
But also reproduces the errorbitter-ability-32190
07/06/2022, 8:47 PMbitter-ability-32190
07/06/2022, 8:48 PMbitter-ability-32190
07/06/2022, 9:09 PMbitter-ability-32190
07/06/2022, 9:09 PMwitty-crayon-22786
07/06/2022, 9:21 PMwitty-crayon-22786
07/06/2022, 9:21 PMwitty-crayon-22786
07/06/2022, 9:22 PMmain has exitedwitty-crayon-22786
07/06/2022, 9:22 PMbitter-ability-32190
07/06/2022, 9:45 PMbitter-ability-32190
07/06/2022, 9:50 PMwitty-crayon-22786
07/06/2022, 9:51 PMDrop when a process exits ⊠it almost certainly depends âhowâ the process exitswitty-crayon-22786
07/06/2022, 9:52 PMsys.exit vs returning from the entrypointbitter-ability-32190
07/06/2022, 10:01 PMbitter-ability-32190
07/06/2022, 10:09 PMbitter-ability-32190
07/06/2022, 10:12 PMbitter-ability-32190
07/07/2022, 12:29 AMsys.exit is it. Oddly, the code kinda looks like it's calling PyFinalize int hat codepath and not the otherbitter-ability-32190
07/07/2022, 12:35 AMPyErr_Print if the return value is NULL I suspect that is in the case of an exception (sys.exit just raises SystemExit)
... eventually in handle_system_exit we call Py_Exit which calls `Py_FinalizeEx`:https://github.com/python/cpython/blob/bec802dbb87717a23acb9c600c15f40bc98340a3/Python/pylifecycle.c#L2922bitter-ability-32190
07/07/2022, 12:39 AMos._exit(...)bitter-ability-32190
07/07/2022, 12:46 AMfinalize in the happy path too đ
So not sure why we don't see this otherwise, other than "race condition"bitter-ability-32190
07/07/2022, 12:52 AMbitter-ability-32190
07/07/2022, 12:53 AMbitter-ability-32190
07/07/2022, 1:01 AMSIGINT is the key here.bitter-ability-32190
07/07/2022, 11:15 AMwitty-crayon-22786
07/07/2022, 4:32 PMbitter-ability-32190
07/07/2022, 4:32 PMawait ting the cancellation wasn't the secret sauce either.bitter-ability-32190
07/07/2022, 4:33 PMwitty-crayon-22786
07/07/2022, 4:33 PMPySessionCancellationLatch which is used to cancel Sessions in general may not be what gets triggered when a SIGINT comes inwitty-crayon-22786
07/07/2022, 4:35 PMwitty-crayon-22786
07/07/2022, 4:36 PMI set up a meetingyea, sounds good. see you there
bitter-ability-32190
07/07/2022, 4:37 PMasync rule running which was running process is attempting to cleanup and taking a very long timewitty-crayon-22786
07/07/2022, 4:49 PMwitty-crayon-22786
07/07/2022, 4:50 PMGraph is `spawn`d onto the Executor⊠i.e., itâs running async in the background. when clients go away, it is cancelled as soon as possible, but not synchronously. so it might take some time for things to wrap up.witty-crayon-22786
07/07/2022, 4:51 PMawait before it can be canceled)witty-crayon-22786
07/07/2022, 4:53 PMbitter-ability-32190
07/07/2022, 4:54 PMbitter-ability-32190
07/07/2022, 4:54 PMwitty-crayon-22786
07/07/2022, 4:54 PMbitter-ability-32190
07/07/2022, 5:00 PMwitty-crayon-22786
07/07/2022, 5:05 PMwitty-crayon-22786
07/07/2022, 5:07 PMwitty-crayon-22786
07/07/2022, 5:07 PMbitter-ability-32190
07/07/2022, 5:07 PMbitter-ability-32190
07/07/2022, 5:08 PMwitty-crayon-22786
07/07/2022, 5:08 PMwitty-crayon-22786
07/07/2022, 5:08 PMbitter-ability-32190
07/07/2022, 5:09 PMPyO3 or PyOxy? đwitty-crayon-22786
07/07/2022, 5:10 PMwitty-crayon-22786
07/07/2022, 5:10 PMbitter-ability-32190
07/07/2022, 5:10 PMpyoxy providing Python, but still using pyo3 for the FFIingwitty-crayon-22786
07/07/2022, 5:11 PMwitty-crayon-22786
07/07/2022, 5:11 PMbitter-ability-32190
07/07/2022, 5:11 PMwitty-crayon-22786
07/07/2022, 5:12 PMpantsd. that could be shipped first, since the pants script could get a hold of the native client binary and invoke that rather than pythonwitty-crayon-22786
07/07/2022, 5:13 PMpants script to get it increases complexity that we want to be removing insteadbitter-ability-32190
07/07/2022, 5:14 PMdebugpy shenanigans will need to be shifted. HMU when that needs to be done, I dont wanna bog John down with that