Moving this to a more public place where we can co...
# development
b
Moving this to a more public place where we can collaborate. I'm trying out an upcoming change where we'll be making N (#cores) VenvPEXs with overlapping sets of deps (using the PEX lockfile, so we know versions and metadata beforehand). And at some point will run into
Copy code
File "/home/joshuacannon/.cache/pants/named_caches/pex_root/installed_wheels/0457d0c3fb526f3f246a04ebe5fa67dfa4f0877fe617e106f23f3b083f7b1ab1/pex-2.1.73-py2.py3-none-any.whl/pex/common.py", line 470, in atomic_directory
    fcntl.lockf(lock_fd, fcntl.LOCK_EX)  # A blocking write lock.
OSError: [Errno 35] Resource deadlock avoided
I suspect the fact that we have overlapping sets of deps being requested in N
pex
processes is what's causing this.
CC @witty-crayon-22786 @enough-analyst-54434 @hundreds-father-404
If anyone wants to try my PoC: branch. NOTE: I don't think the Pants repo has enough instances of overlapping-but-unique dep sets to create the deadlock on my machine. Our work repo however does.
e
That's a straight up bug. Do you mind filing a Pex issue?
🐛 1
b
Will do!
e
Responded. I'm out for the next ~2 weeks, but I left a likely workaround that leaves us dumb. Find the threadpool, switch it to a multiprocessing pool.
b
Unfortunately process pooling involves pickling data back and forth, which some of the data being sent to the pool isn't pickleable (namely
SSLContext
, but could be more as we only see the first offender)
e
Yeah, you might need to get creative or, better, we need to actually understand the fcntl behavior here.
One example of creative is switching to
execute_parallel
(an internal API used extensively like
atomic_directory
). That's process based, but doesn't pickle. You need to actually write a shim script that can be used to fork a process for the thing you need to do (run a download). Currently this is mainly (fully?) used to fork Pip subprocesses to do parallel wheel building and parallel wheel installing.
b
Another quick-and-dirty muck to make the arguments pickleable and looks like the switch works. I'll wipe my cache and try it a few more times to be ultra-sure.
Mirroring ticket comment here. My current suspicion is that Pants is forking it's own process (to run other processes), which via PEX's threadpool holds onto one-or-more atomic locks, and the new process then inherits the held lock.
I have a fix/workaround, will make a PR shortly and post 🙌
🙌 1
The real test here is to reproduce this issue with a simple process that spawns threads and acquires locks and have it be run by Pants. That'll validate PEX really isn't to "blame" here (while it'll still need a workaround)
OK PEX PR posted: https://github.com/pantsbuild/pex/pull/1694 Still want to reproduce this outside of PEX to file a bug to Pants