Moving this to a more public place where we can co...
# development
Moving this to a more public place where we can collaborate. I'm trying out an upcoming change where we'll be making N (#cores) VenvPEXs with overlapping sets of deps (using the PEX lockfile, so we know versions and metadata beforehand). And at some point will run into
Copy code
File "/home/joshuacannon/.cache/pants/named_caches/pex_root/installed_wheels/0457d0c3fb526f3f246a04ebe5fa67dfa4f0877fe617e106f23f3b083f7b1ab1/pex-2.1.73-py2.py3-none-any.whl/pex/", line 470, in atomic_directory
    fcntl.lockf(lock_fd, fcntl.LOCK_EX)  # A blocking write lock.
OSError: [Errno 35] Resource deadlock avoided
I suspect the fact that we have overlapping sets of deps being requested in N
processes is what's causing this.
CC @witty-crayon-22786 @enough-analyst-54434 @hundreds-father-404
If anyone wants to try my PoC: branch. NOTE: I don't think the Pants repo has enough instances of overlapping-but-unique dep sets to create the deadlock on my machine. Our work repo however does.
That's a straight up bug. Do you mind filing a Pex issue?
🐛 1
Will do!
Responded. I'm out for the next ~2 weeks, but I left a likely workaround that leaves us dumb. Find the threadpool, switch it to a multiprocessing pool.
Unfortunately process pooling involves pickling data back and forth, which some of the data being sent to the pool isn't pickleable (namely
, but could be more as we only see the first offender)
Yeah, you might need to get creative or, better, we need to actually understand the fcntl behavior here.
One example of creative is switching to
(an internal API used extensively like
). That's process based, but doesn't pickle. You need to actually write a shim script that can be used to fork a process for the thing you need to do (run a download). Currently this is mainly (fully?) used to fork Pip subprocesses to do parallel wheel building and parallel wheel installing.
Another quick-and-dirty muck to make the arguments pickleable and looks like the switch works. I'll wipe my cache and try it a few more times to be ultra-sure.
Mirroring ticket comment here. My current suspicion is that Pants is forking it's own process (to run other processes), which via PEX's threadpool holds onto one-or-more atomic locks, and the new process then inherits the held lock.
I have a fix/workaround, will make a PR shortly and post 🙌
🙌 1
The real test here is to reproduce this issue with a simple process that spawns threads and acquires locks and have it be run by Pants. That'll validate PEX really isn't to "blame" here (while it'll still need a workaround)
OK PEX PR posted: Still want to reproduce this outside of PEX to file a bug to Pants