https://pantsbuild.org/ logo
c

cuddly-window-48195

10/16/2019, 10:16 PM
FYI, I'm debugging a locking issue with @aloof-angle-91616 in #general; I just got repro (https://github.com/ns-cweber/pants-repro) and it looks like it's coming from the engine (coming out of lmdb specifically). Seems relevant for this channel as well.
👍 1
e

enough-analyst-54434

10/16/2019, 10:25 PM
This most naturally seems an artifact of containers. I think we're normally protected by the following code hierarchy:
Copy code
<https://github.com/pantsbuild/pants/blob/33a6d51db91ea1fe69c39117c8878abcb824cdfb/src/python/pants/process/lock.py#L22>
 -> <https://fasteners.readthedocs.io/en/latest/api/process_lock.html>
  -> fcntl.lockf(self.lockfile, fcntl.LOCK_EX | fcntl.LOCK_NB)
The last is an advisory interprocess lock that I imagine is not valid across seperate countainer namespaces.
🔥 1
a

aloof-angle-91616

10/16/2019, 10:40 PM
do we have any other implementations of file locks anywhere that already might do the trick? figuring this out now
no it looks like we have centralized around the canonical version, cool
c

cuddly-window-48195

10/16/2019, 10:57 PM
I figured it had something to do with locks; still not sure why bash changes the behavior. In any case; is there a lock file that I'm meant to be mounting across containers? The whole cache directory is shared...
a

aloof-angle-91616

10/16/2019, 10:57 PM
oh.
.pants.workdir.file_lock
might be that
i believe the other file locks are stored in
.pants.d
, i will run a
find
to see
c

cuddly-window-48195

10/16/2019, 10:59 PM
Pretty sure .pants.workdir.file_lock is being shared by the
$PWD:/workdir
volume mount in the docker-compose.yml file.
👍 1
a

aloof-angle-91616

10/16/2019, 10:59 PM
ok
c

cuddly-window-48195

10/16/2019, 11:00 PM
They have the same timestamp, so I think they're shared.
👍 1
a

aloof-angle-91616

10/16/2019, 11:00 PM
that is a good heuristic in this case, i think
c

cuddly-window-48195

10/16/2019, 11:04 PM
Copy code
"Path": "/usr/bin/scl",
        "Args": [
            "enable",
            "devtoolset-7",
            "--",
            "./pants",
            "run",
            "package:main"
        ],
a

aloof-angle-91616

10/16/2019, 11:04 PM
that's what we do in our CI
in our `Dockerfile`s directly, they're in
build-support/
somewhere
(the
scl
command line)
(also, the
(backtrace omitted)
part is a bad error message -- it means pants wasn't able to get it, not that it decided not to)
(or at least, turning on
PANTS_PRINT_EXCEPTION_STACKTRACE=True
did not change the result)
e

enough-analyst-54434

10/16/2019, 11:06 PM
For sanity sake - does everyone here know that
fcntl.lockf(self.lockfile, fcntl.LOCK_EX | fcntl.LOCK_NB)
should definitely work in seperate containers. Sure you share the relevent fs, but do you know what's going on in fcntl? I do not, I'm fairly unix / linux dumb
c

cuddly-window-48195

10/16/2019, 11:07 PM
Yeah, I figured out that the error is coming out of the lmdb library (the C library, not the rust wrapper).
I don't know how fcntl works 😞
e

enough-analyst-54434

10/16/2019, 11:08 PM
OK - until someone does we'll all be blowing smoke
c

cuddly-window-48195

10/16/2019, 11:08 PM
TIL it exists
e

enough-analyst-54434

10/16/2019, 11:09 PM
IFF this is reliable info: https://gavv.github.io/articles/file-locks/#differing-features then this wont work. We use 'POSIX record locks' which lock a (inode, pid) pair. If that is correct, the two containers have different pid namespaces so broken
I do know the two containers definitely do have different pid namespaces
I don't know about the veracity of the rest
c

cuddly-window-48195

10/16/2019, 11:11 PM
I'm way out of my depth, but does it matter that the file descriptor is mounted into both containers?
e

enough-analyst-54434

10/16/2019, 11:11 PM
So, assuming this is all true though, next step @cuddly-window-48195 is to get the two containers using the same pid namespace.
fd maps to inode, thet's only 1/2 of the key.
c

cuddly-window-48195

10/16/2019, 11:12 PM
Any tips on how to do that?
e

enough-analyst-54434

10/16/2019, 11:12 PM
So there are two problems to contend with
So that solves pid. The indoe bit I'm not sure about. Your current compose shared volume is the best I know how to do off the top. So start with configuring the one container to be in the other's pid namespace then report back.
c

cuddly-window-48195

10/16/2019, 11:14 PM
Ok, I set the pid namespace for both and it didn't change the behavior
e

enough-analyst-54434

10/16/2019, 11:15 PM
So now
stat
the same lockfile from both containers and compare inode.
c

cuddly-window-48195

10/16/2019, 11:16 PM
do we know which lockfile?
e

enough-analyst-54434

10/16/2019, 11:16 PM
Doesn't matter. Any file shared between the two containers will do for this experiment.
Now that you know how we do our locking though, you should have everything you need to know to debug this. I need to run!
c

cuddly-window-48195

10/16/2019, 11:17 PM
Copy code
[root@89d05d1595d4 workdir]# stat ~/.cache/pants/lmdb_store/
  File: '/root/.cache/pants/lmdb_store/'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 801h/2049d      Inode: 1097323     Links: 5
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-10-16 21:46:55.290751000 +0000
Modify: 2019-10-16 21:46:55.300751000 +0000
Change: 2019-10-16 21:46:55.300751000 +0000
 Birth: -
e

enough-analyst-54434

10/16/2019, 11:18 PM
Any file will do, don't introduce a dir as a variable no matter how sensible seeming.
We definitely lock a file, not a dir
And you must do this from both containers and compare results. Inode == then the linked article or my readinf of it likely wrong, != and we have a likely explanation for lock failure.
c

cuddly-window-48195

10/16/2019, 11:19 PM
container 0:
Copy code
[root@89d05d1595d4 workdir]# stat ~/.cache/pants/lmdb_store/files/0/lock.mdb
  File: '/root/.cache/pants/lmdb_store/files/0/lock.mdb'
  Size: 8192            Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d      Inode: 1097326     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-10-16 23:19:12.115466000 +0000
Modify: 2019-10-16 23:19:12.115466000 +0000
Change: 2019-10-16 23:19:12.115466000 +0000
 Birth: -
container 1:
Copy code
File: '/root/.cache/pants/lmdb_store/files/0/lock.mdb'
  Size: 8192            Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d      Inode: 1097326     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-10-16 23:19:12.115466000 +0000
Modify: 2019-10-16 23:19:12.115466000 +0000
Change: 2019-10-16 23:19:12.115466000 +0000
 Birth: -
Same inode
e

enough-analyst-54434

10/16/2019, 11:22 PM
OK - I leave it to you to dig further and file an issue when you've found the answer or cry uncle and just want to dump a summary of the current state of knowledge.
👍 2
c

cuddly-window-48195

10/16/2019, 11:24 PM
Ok