I m trying to fix <https github com pantsbuild pants pull 83 Pants #development

I'm trying to fix <https://github.com/pantsbuild/p...

hundreds-breakfast-49010

10/01/2019, 10:52 PM

I'm trying to fix https://github.com/pantsbuild/pants/pull/8329#discussion_r328735829 and I'm not how to go about constructing a rust FFI function that avoids calling

scheduler.core.executor.block_on

that can still return a

PyResult

synchronously

average-vr-56795

10/01/2019, 10:54 PM

I suspect you’ll need to turn it into something that you can

yield

on (but not something in the

Graph

because that would be cached)…

hundreds-breakfast-49010

10/01/2019, 10:55 PM

that makes sense

hundreds-breakfast-49010

10/01/2019, 10:55 PM

are we doing this anywhere else in the codebase currently?

hundreds-breakfast-49010

10/01/2019, 10:55 PM

I'm not sure how python yield semantics interact with rust-side future semantics

average-vr-56795

10/01/2019, 11:06 PM

Only for things in the graph, currently…

average-vr-56795

10/01/2019, 11:07 PM

@witty-crayon-22786 ^^

witty-crayon-22786

10/02/2019, 2:19 AM

soooo

witty-crayon-22786

10/02/2019, 2:21 AM

hmmmm

witty-crayon-22786

10/02/2019, 2:23 AM

lemme respond to this in a bit. will take some thinking.

witty-crayon-22786

10/02/2019, 3:35 AM

hm. yea, dang.

witty-crayon-22786

10/02/2019, 3:36 AM

while we could hypothetically just block this thread in some other way that won't cause the tokio executor to complain, we would still be blocking one of the tokio runtime's functions.

witty-crayon-22786

10/02/2019, 3:37 AM

now, @average-vr-56795 actually explored an alternative to blocking threads on this IO at one point, and i think that if we were to switch back to that workflow we would be ok.

witty-crayon-22786

10/02/2019, 3:39 AM

https://github.com/pantsbuild/pants/pull/7788 was one of the tickets... and i fully expect that at some point switching back to that mode will be feasible.

witty-crayon-22786

10/02/2019, 3:42 AM

in the meantime, i think that if we add another copy of the relevant

materialize

method that doesn't use

runtime.block_on

to wait for the

Future

(here: https://github.com/pantsbuild/pants/blob/6bf36f5387dd2f969331e2f9c83a4b2d8e2d3c3a/src/rust/engine/engine_cffi/src/lib.rs#L866-L908 ), and instead uses some other mechanism that will sidestep tokio's tracking and allow us to block the thread without triggering a panic, it will be "ok"

witty-crayon-22786

10/02/2019, 3:43 AM

i say "ok", because, if we ensured that this method was only called from a

@console_rule

, we could guarantee that we would only have 1 running at a time

witty-crayon-22786

10/02/2019, 3:44 AM

and if we set a minimum number of threads for the tokio runtime, blocking one thread should never cause a deadlock.

witty-crayon-22786

10/02/2019, 3:45 AM

so: we'd attach a caveat to that, and we'd reference switching back to non-blocking IO for this codepath (@average-vr-56795: did we end up with a ticket for trying that again?)

hundreds-breakfast-49010

10/02/2019, 7:49 PM

did you see john's suggestion on https://github.com/pantsbuild/pants/pull/8329#discussion_r330613798 ?

witty-crayon-22786

10/02/2019, 8:00 PM

i did not, sorry. lots o email. thanks

witty-crayon-22786

10/02/2019, 8:06 PM

@hundreds-breakfast-49010: commented.

hundreds-breakfast-49010

10/02/2019, 10:50 PM

jumping back to this. I'm actually confused about why the call to

block_on

materialize_directories

is causing a panic to begin with

hundreds-breakfast-49010

10/02/2019, 10:50 PM

that's https://github.com/pantsbuild/pants/blob/master/src/rust/engine/task_executor/src/lib.rs#L114, right?

hundreds-breakfast-49010

10/02/2019, 10:51 PM

and that

block_on

method is creating a new

tokio::runtime::Runtime

and invoking its own

block_on

method

hundreds-breakfast-49010

10/02/2019, 10:51 PM

I don't know why creating a separate runtime causes this panic

hundreds-breakfast-49010

10/02/2019, 10:52 PM

(also when I actually run my test I don't get a rust panic, I get a segfault - maybe that's the expected behavior if a rust panic happens inside a CFFI function?)

hundreds-breakfast-49010

10/02/2019, 10:53 PM

(https://doc.rust-lang.org/std/panic/fn.catch_unwind.html, "It is currently undefined behavior to unwind from Rust code into foreign code, so this function is particularly useful when Rust is called from another language (normally C). This can run arbitrary Rust code, capturing a panic and allowing a graceful handling of the error." . so maybe we should be wrapping all our invocations of rust functions called from python in a

catch_unwind

hundreds-breakfast-49010

10/02/2019, 10:58 PM

but back to the runtime question, I'm not sure why a blocking operation on one tokio

Runtime

would affect another tokio

Runtime

hundreds-breakfast-49010

10/02/2019, 10:59 PM

I also notice that that

block_on

method is

&self

task_executor::Executor

, but it doesn't seem to be using the

&self

pointer at all

witty-crayon-22786

10/02/2019, 11:19 PM

catch_unwind is... also ill defined, i thought

witty-crayon-22786

10/02/2019, 11:20 PM

my expectation is that we are not using a new anonymous instance, but that that Runtime call uses a thread local instance of the runtime

witty-crayon-22786

10/02/2019, 11:20 PM

if we were using a new one each time we'd be creating new pools

hundreds-breakfast-49010

10/02/2019, 11:22 PM

https://docs.rs/tokio/0.1.22/tokio/runtime/struct.Runtime.html#method.new is the documentation for

Runtime::new

hundreds-breakfast-49010

10/02/2019, 11:23 PM

it doesn't say anything about using a thread local already existing instance of a runtime

hundreds-breakfast-49010

10/02/2019, 11:23 PM

but I'm not very familiar with tokio so I might be misunderstanding what's going on

witty-crayon-22786

10/02/2019, 11:25 PM

well... that's.. awkward. heh.

witty-crayon-22786

10/02/2019, 11:26 PM

there are a lot of possibilities: this version of the method could launch the work on the

CpuPool

, and then use

future.wait()

hundreds-breakfast-49010

10/02/2019, 11:27 PM

so that's the

io_pool

member of the

Executor

struct

witty-crayon-22786

10/02/2019, 11:27 PM

off topic, i suppose, but: we should probably not be creating new

Runtimes

witty-crayon-22786

10/02/2019, 11:27 PM

yep

witty-crayon-22786

10/02/2019, 11:27 PM

with a big fat comment explaining the above

hundreds-breakfast-49010

10/02/2019, 11:28 PM

so does that imply that the original implementation of

materialize_directories

is wrong, becuase it calls

block_on

which creates a new Runtime?

witty-crayon-22786

10/02/2019, 11:28 PM

not wrong, just possibly less efficient.

witty-crayon-22786

10/02/2019, 11:29 PM

we should be using ~one runtime (maybe there is a comment somewhere nearby that explains why we don't)

hundreds-breakfast-49010

10/02/2019, 11:29 PM

anyway, it looks like I can call

spawn

on the cpu pool: https://docs.rs/futures-cpupool/0.1.8/futures_cpupool/struct.CpuPool.html#method.spawn

hundreds-breakfast-49010

10/02/2019, 11:29 PM

I'm not sure how cpu pools interact with tokio runtimes

witty-crayon-22786

10/02/2019, 11:29 PM

they mostly don't: the tokio runtime is one threadpool, the cpupool is another.

hundreds-breakfast-49010

10/02/2019, 11:30 PM

but then I have a

CpuFuture

object I'm not sure what to do with to force it to complete

witty-crayon-22786

10/02/2019, 11:30 PM

when you call `future.wait()`on the thing returned by the CpuPool, you will be doing so on the tokio runtime, and blocking one of its threads

hundreds-breakfast-49010

10/02/2019, 11:30 PM

there's no way around forcing a future to complete and yield a

Result

that we turn into a

PyResult

in this function, right?

witty-crayon-22786

10/02/2019, 11:30 PM

future.wait()

hundreds-breakfast-49010

10/02/2019, 11:31 PM

so that's https://docs.rs/futures/0.1.17/futures/future/trait.Future.html#method.wait

witty-crayon-22786

10/02/2019, 11:31 PM

yes. we need to block the thread. see above.

hundreds-breakfast-49010

10/02/2019, 11:32 PM

Future

is a trait not a type in and of itself

witty-crayon-22786

10/02/2019, 11:32 PM

yea. you'll call wait on the one returned by the CpuPool

witty-crayon-22786

10/02/2019, 11:33 PM

er. and spawn doesn't give you one directly... so maybe you want spawn_fn? can't remember

hundreds-breakfast-49010

10/02/2019, 11:33 PM

Executor

also makes

io_pool

a private member

hundreds-breakfast-49010

10/02/2019, 11:34 PM

which I could change to pub I guess, but maybe there's a reason it's not exposed right now

witty-crayon-22786

10/02/2019, 11:37 PM

it exposes it ... somehow, yea?

hundreds-breakfast-49010

10/02/2019, 11:41 PM

okay, I think doing this gets my test to no longer segfault

hundreds-breakfast-49010

10/02/2019, 11:41 PM

woo

witty-crayon-22786

10/02/2019, 11:43 PM

woot 😃

hundreds-breakfast-49010

10/03/2019, 12:11 AM

updated: https://github.com/pantsbuild/pants/pull/8329 @average-vr-56795

Open in Slack

Previous Next