I'm trying to fix <https://github.com/pantsbuild/p...
# development
h
I'm trying to fix https://github.com/pantsbuild/pants/pull/8329#discussion_r328735829 and I'm not how to go about constructing a rust FFI function that avoids calling
scheduler.core.executor.block_on
that can still return a
PyResult
synchronously
a
I suspect you’ll need to turn it into something that you can
yield
on (but not something in the
Graph
because that would be cached)…
h
that makes sense
are we doing this anywhere else in the codebase currently?
I'm not sure how python yield semantics interact with rust-side future semantics
a
Only for things in the graph, currently…
@witty-crayon-22786 ^^
w
soooo
hmmmm
lemme respond to this in a bit. will take some thinking.
hm. yea, dang.
while we could hypothetically just block this thread in some other way that won't cause the tokio executor to complain, we would still be blocking one of the tokio runtime's functions.
now, @average-vr-56795 actually explored an alternative to blocking threads on this IO at one point, and i think that if we were to switch back to that workflow we would be ok.
https://github.com/pantsbuild/pants/pull/7788 was one of the tickets... and i fully expect that at some point switching back to that mode will be feasible.
in the meantime, i think that if we add another copy of the relevant
materialize
method that doesn't use
runtime.block_on
to wait for the
Future
(here: https://github.com/pantsbuild/pants/blob/6bf36f5387dd2f969331e2f9c83a4b2d8e2d3c3a/src/rust/engine/engine_cffi/src/lib.rs#L866-L908 ), and instead uses some other mechanism that will sidestep tokio's tracking and allow us to block the thread without triggering a panic, it will be "ok"
i say "ok", because, if we ensured that this method was only called from a
@console_rule
, we could guarantee that we would only have 1 running at a time
and if we set a minimum number of threads for the tokio runtime, blocking one thread should never cause a deadlock.
so: we'd attach a caveat to that, and we'd reference switching back to non-blocking IO for this codepath (@average-vr-56795: did we end up with a ticket for trying that again?)
h
w
i did not, sorry. lots o email. thanks
@hundreds-breakfast-49010: commented.
h
jumping back to this. I'm actually confused about why the call to
block_on
in
materialize_directories
is causing a panic to begin with
and that
block_on
method is creating a new
tokio::runtime::Runtime
and invoking its own
block_on
method
I don't know why creating a separate runtime causes this panic
(also when I actually run my test I don't get a rust panic, I get a segfault - maybe that's the expected behavior if a rust panic happens inside a CFFI function?)
(https://doc.rust-lang.org/std/panic/fn.catch_unwind.html, "It is currently undefined behavior to unwind from Rust code into foreign code, so this function is particularly useful when Rust is called from another language (normally C). This can run arbitrary Rust code, capturing a panic and allowing a graceful handling of the error." . so maybe we should be wrapping all our invocations of rust functions called from python in a
catch_unwind
?)
but back to the runtime question, I'm not sure why a blocking operation on one tokio
Runtime
would affect another tokio
Runtime
I also notice that that
block_on
method is
&self
on
task_executor::Executor
, but it doesn't seem to be using the
&self
pointer at all
w
catch_unwind is... also ill defined, i thought
my expectation is that we are not using a new anonymous instance, but that that Runtime call uses a thread local instance of the runtime
if we were using a new one each time we'd be creating new pools
h
it doesn't say anything about using a thread local already existing instance of a runtime
but I'm not very familiar with tokio so I might be misunderstanding what's going on
w
well... that's.. awkward. heh.
there are a lot of possibilities: this version of the method could launch the work on the
CpuPool
, and then use
future.wait()
h
so that's the
io_pool
member of the
Executor
struct
w
off topic, i suppose, but: we should probably not be creating new
Runtimes
yep
with a big fat comment explaining the above
h
so does that imply that the original implementation of
materialize_directories
is wrong, becuase it calls
block_on
which creates a new Runtime?
w
not wrong, just possibly less efficient.
we should be using ~one runtime (maybe there is a comment somewhere nearby that explains why we don't)
h
I'm not sure how cpu pools interact with tokio runtimes
w
they mostly don't: the tokio runtime is one threadpool, the cpupool is another.
h
but then I have a
CpuFuture
object I'm not sure what to do with to force it to complete
w
when you call `future.wait()`on the thing returned by the CpuPool, you will be doing so on the tokio runtime, and blocking one of its threads
h
there's no way around forcing a future to complete and yield a
Result
that we turn into a
PyResult
in this function, right?
w
future.wait()
h
w
yes. we need to block the thread. see above.
h
oh
Future
is a trait not a type in and of itself
w
yea. you'll call wait on the one returned by the CpuPool
er. and spawn doesn't give you one directly... so maybe you want spawn_fn? can't remember
h
Executor
also makes
io_pool
a private member
which I could change to pub I guess, but maybe there's a reason it's not exposed right now
w
it exposes it ... somehow, yea?
h
okay, I think doing this gets my test to no longer segfault
woo
w
woot 😃
h