Hm, re PyO3 and an incremental migration:
# development
h
Hm, re PyO3 and an incremental migration:
I got a PR working to port
PyNailgunClient
. But killing Pantsd results in this:
Copy code
The pantsd process was killed during the run.

If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`). You can also disable pantsd with the global option `--pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching.

If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (<https://www.pantsbuild.org/docs/getting-help>).
Traceback (most recent call last):
  File "/Users/ericarellano/code/pants/src/python/pants/bin/pants_loader.py", line 100, in run_default_entrypoint
    exit_code = runner.run(start_time)
  File "/Users/ericarellano/code/pants/src/python/pants/bin/pants_runner.py", line 86, in run
    return remote_runner.run()
  File "/Users/ericarellano/code/pants/src/python/pants/bin/remote_pants_runner.py", line 99, in run
    return self._connect_and_execute(pantsd_handle)
  File "/Users/ericarellano/code/pants/src/python/pants/bin/remote_pants_runner.py", line 131, in _connect_and_execute
    return PyNailgunClient(port, executor).execute(command, args, modified_env)
native_engine_pyo3.PantsdClientException: The pantsd process was killed during the run.

If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`). You can also disable pantsd with the global option `--pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching.

If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (<https://www.pantsbuild.org/docs/getting-help>).
We only want that message to be printed once, not twice. This gets me wondering if that's probably because with Rust-CPython we have functions like
maybe_set_panic_handler
, and those prob don't apply to the PyO3 extension too?
This incremental migration might be less feasible than I hoped
For context, the PR https://github.com/pantsbuild/pants/pull/12181. Note the awkwardness with
PyExecutor
being defined in both cpython and PyO3. I do wonder if this migration needs to be done all at once
w
This gets me wondering if that’s probably because with Rust-CPython we have functions like 
maybe_set_panic_handler
, and those prob don’t apply to the PyO3 extension too?
the panic handler is a rust concept rather than anything related to Python, so probably not.
my guess re the “unclean exit” is just that you’re no longer raising the right exception type
should be a KeyboardInterrupt
h
we want an unclean exit, I `kill -9`ed the process. But we don't want the error to dump twice
w
ah
is that just that the
str
or
repr
of the exception type changed…?
afaict, it doesn’t seem like a fundamental issue… just a small bug in the port
h
it doesn’t seem like a fundamental issue
Fair, maybe I'll figure out what's going on when I have fresh eyes re the
PyExecutor
awkwardness, that awkwardness is probably better than the risk of one huge PR
Ah red herring, my PR does nothing new with this error message.
main
behaves the same way. Yay!
👍 1
w
maybe a
--print-stacktrace
artifact…?
h
Possibly. I noticed some issues with stacktraces this morning that I'm triaging now and going to write issues for
I ported stdio to PyO3, but I'm trying to figure out why
./pants lint
now logs to the log file rather than console: https://github.com/pantsbuild/pants/pull/12184/commits/71b4220eaa2c57a85da5681ead7028e01bc0c160 I double checked my port and am pretty confident I did it one-for-one, except for using
Python.allow_threads()
in more places. Removing that doesn't make a difference Any ideas?
w
hm… nothing is jumping out at me. but yea, it would be because the
Destination
isn’t set on the appropriate thread.
h
Is it possible that the threads are being launched by rust-cpython and not linked properly to these stdio settings from PyO3? Pretty sure that's not how this works, but only idea I have
w
i know that rust-cpython does not launch any threads
👍 1
i doubt that Py03 does either
h
Hm, could it be that
Logger::init()
isn't applying properly? Iiuc the two Rust extensions are isolated from each other
I'm working more on this before we leave for our trip in a little, and I'm souring on an incremental migration. After this stdio and Nailgun client PR, there's very little we can incrementally port, it's all deeply tied to Scheduler Then this mysterious issue with logging I also realize I want to be able to benchmark this migration and easily revert it / not land if it's a problem. That's hard to revert if we've incrementally ported -- @witty-crayon-22786 would you consider one big PR? I know that would be a beast to review
w
hm… lots of “mysterious issues” would seem to suggest that incremental is a good idea. because otherwise if feels like you’d need to tackle tons at once
unless it turns out that it really is a python issue… i sortof doubt that it is
the thread-local access is not actually powered by python… it’s a rust-level thread local
h
True. How did you pull off such an epic change with CFFI -> CPython?
w
touche =P … i worked on it full time for… weeks
👍 1
h
It's plausible we could get PyO3 to understand Rust-CPython or vice versa, but this is messy.
I think a main reason I'm concerned about this is we do want to make some improvements to our FFI, e.g. no longer using the
with
context managers. Navigating that interop seems pretty complex
w
part of that was not having a framework in place to actually execute it incrementally: you’ve already done that here
the async / futures3 migration was incremental though, because we had the framework
👍 1
yea, i do think that some of the changes will be big bang (everything reference the Scheduler). logging seems like a good choice to be independent though, since it is truly independent.
h
Nailgun client, Stdio, and Testutil are the only things independent of the scheduler, outside of 2-3 trivial functions like
all_metrics()
and
match_path_globs
. Which means ~80% of the migration would be a single PR, unless we try to interoporate between the frameworks
w
80% is less than 100%, heh.
i can spend some time looking at the stdio issue if you’d like.
h
I did not know!
i can spend some time looking at the stdio issue if you’d like.
If you have the time, that'd be great. I pushed this morning a change to stop using
py.allow_threads(||)
on trivial functions b/c maintainer explained there's some overhead to it. So it's more of a 1-1 comparison with the original rust-cpython now but, of course, this port isn't our highest priority. Altho I do think it gives some good cleanup + facilaties usign py.allow_threads(||) more
w
yea.
h
Hm, so with debugging stdio, maybe I could try porting functions one-at-a-time. Iiuc, those ~7 stdio functions aren't all-or-nothing, right?
w
stdio going to the wrong destination could only really be 1 thing: the destination not being set to the console correctly
👍 1
if you get no output at all to the console, then you’d want to see whether it was appropriately set … ever
1
if you get some output to the console, and some that goes to the log, that would be the destination failing to be propagated from one thread/task to the other… but that’s unlikely, because it has nothing to do with python
if no output to console, you’d want to check that inside the call to stdio_thread_console_set you actually have one
h
Nothing to the console ever. And
./pants run build-support/bin/changelog.py
fails with
Exception(\'Cannot start Exclusive access on Destination Logging\')
w
yea. so no Console is being set then
so
stdio_thread_console_set
is not doing the right thing
👀 1
✔️ 1
it doesn’t look like any callers were updated
(unless i’m missing something)
does src/python/pants/init/logging.py need to be updated to import the new module?
h
It should be, I updated logging.py to
from pants.engine.internals import native_engine_pyo3 as native_engine
w
ah, hm. yea.
yea, would confirm that inside the call to set the console that you actually end up with one.
h
I added a panic inside
stdio_thread_console_set
and see the destination as:
Copy code
Destination(Mutex { data: Console { console: Console { stdin_handle: Some(File { fd: 0, path: "/dev/ttys000", read: true, write: true }), stdout_handle: Some(File { fd: 1, path: "/dev/ttys000", read: true, write: true }), stderr_handle: Some(File { fd: 2, path: "/dev/ttys000", read: true, write: true }) } } })
I gotta get going though. Thanks for helping with this and talking me down with the incremental migration thing!
w
ok, interesting. so if the Console is actually successfully set, then the next question is whether the
Arc<Destination>
that is being used in the spawned threads is actually the right one.
but yea, i’ll try to take a look.
h
For sure. Note that scheduler.rs is doing some things with stdio too. I suspect that it's console destination != this one
Thanks! Does it make sense to not merge Nailgun port until we solve this because it might impact whether incremental is possible or not? Maybe it's still fine. And even if console can't be figured out, we can still port most of stdio (I'm also thinking that matches_path_globs is a great incremental PR after all because it will set precedent for how to consume Python objects like PathGlobs)
w
um, i think it’s ok to merge nailgun
i’m optimistic about this being worthwhile.
❤️ 1
h
Sg, thanks! I'm very very optimistic about the overall migration being worth it - this code is so much more ergonomic. Only a question of how best to pull it off This sounds good tho, thank you
the commits are mucked up, but https://github.com/pantsbuild/pants/pull/12184 is now in its final state outside of this issue - I ported the panic handler this morning