Is there any parallelism in the v1 nailgun executo...
# development
e
Is there any parallelism in the v1 nailgun executor when compiling java/scala code or is only one client connected to the server at a time?
a
the v1
NailgunTask
impl will have as many worker threads as you specify with
--worker-count
, all sharing the same address space
so there is lots of parallelism there, and i believe we actually have a single
NailgunClient
, which then spins off a
NailgunClientSession
that connects to the nailgun server for each attempted process execution
pants will create literal python threads equal to the value of
--worker-count
, and each thread just has one job, to connect to the nailgun server and perform a single execution
https://github.com/pantsbuild/pants/pull/6579 this is a really really old PR that attempted to make this more explicit by using contextmanagers to represent nailgun client/session state
so basically there is parallelism, which we get by spinning up multiple python worker threads. python threads can only execute one at a time, but since each thread basically just blocks on i/o and sends i/o chunks back to pants, we can get lots of parallelism because python threads can make use of parallelism if there's a lot of i/o
let me know if that makes sense or not, we can pair too
e
thank you! makes sense.
a
lit
note that one of the reasons why this implementation is used right now is that it allows
NailgunTask
to seamlessly support using
--execution-strategy=subprocess
without many changes
because with
--execution-strategy=subprocess
, each python thread just executes a literal process instead of connecting to a nailgun server
so like, that's a useful facet of the current architecture, because it allowed for the generic
--execution-strategy
mechanism
but that's just to explain why it is the way it is -- we have different constraints now, so no need to copy that directly
e
Building on this. I believe v1 starts multiple nailgun servers. is that correct?
a
for a single jvm tool, v1 starts a single nailgun server process with the zinc classpath. it will then create N threads in python, each of which will: 1. take a compile request (for a single target) off the queue 2. create a nailgun request 3. connect to the single nailgun server with that request 4. write to the cache, etc when the request is complete in this case, the multithreading is done in python. we could simulate this in rust if we: 1. create a single nailgun server process in the background 2. create a future for each target which creates a nailgun request (blocking until a target is available to compile) 3. in each future, connect to the single nailgun server, and execute the request, “blocking” on its output let me know if that makes sense!
just edited the above message to add step 1 in rust (create the single server process)
e
yeah it does. I’m doing multithreading with tokio now, but running in to some race condition which I am having a hard time sorting out.
a
oh that’s not fun
e
I talked to @witty-crayon-22786 a few times about increasing parallelism in hermetic nailgun, and we landed on 1 server per client, and not multithreading. I can’t remember why that was now tho…should have written a 1 pager.
a
it’s a heck of a lot easier to implement i think
e
Yeah not fun, but I was mostly trying to remember if there reason it wasn’t working was because our nailgun had some issues that made threading hard.
a
do you want to pair on this? i can probably screenshare for a bit. can’t guarantee we’ll magically fix the race though
e
Not right now, I’m going to get something reviewable without parallelism before I pick it back up.
a
i think one thing that’s relevant is that the compiles have to be done in dependency order. in python you can just pop things off a queue and i think the GIL may make it easier than in rust where you can actually have threading
that sounds like a great idea
e
Yeah, but I assumed that the node runner in v2 was dealing with dependency order. Like it wouldn’t schedule a node until all the futures for its dependencies had finished.
a
feel free to add me as reviewer, i’m trying to get back into that (no sweat though)
yeah that sounds right actually
but that being said, that would be true if we had compile 100% in v2, which we don’t quite yet
e
Ah interesting point
a
but i think that controlling it from python might get the same result
e
that could be the crux
I will investigate more later. Thank!
a
you’re welcome!
e
because now we are only scheduling 1 compile request at a time with product_request or some such thing?
👍 1
a
yes that’s it