https://pantsbuild.org/ logo
w

witty-crayon-22786

02/09/2023, 9:44 PM
thread for the docker mount point race: https://github.com/pantsbuild/pants/issues/18162
the summary of the problem is essentially that processes being
exec
’d inside an existing container may not be able to observe all of the inputs in their sandboxes (the parent directory of which is mounted once per-container)
a few potential mitigations: • don’t (re)use cached containers ◦ on my older macOS machine, it takes 750ms to start a relatively slim image, so this is probably a non-starter. • dynamically add / remove bind mounts the running container ◦ probably feasible, but only helps address the problem if the filesystem consistency issue would not exhibit for bind mounts created after all of the input files exist… which might not be the case if the issue is actually in some code that is running outside of the container not being synchronized • tar-pipeing process inputs into the container, rather than using bind mounts ◦ would prevent use of immutable inputs, which would be annoying, and probably not be quite as efficient …?
f

fast-nail-55400

02/09/2023, 10:06 PM
there is also copying files in to and out of a docker volume
and mount the docker volume into the cached container
(versus using a bind mount)
I saw that as a potential solution; unclear whether it would work for this use case though without some further thought.
longer term, mount an NFSv4 volume into the container and then have pantsd export a NFSv4 server hosting input roots.
w

witty-crayon-22786

02/09/2023, 10:15 PM
there is also copying files in to and out of a docker volume
this is essentially the tar-pipe idea… more complicated “copy files into a container” cases seem to suggest tar-pipes. there doesn’t appear to be a way to create a volume “from scratch” containing content.
longer term, mount an NFSv4 volume into the container and then have pantsd export a NFSv4 server hosting input roots.
yea… would be generally useful too.
f

fast-nail-55400

02/09/2023, 10:16 PM
well you could spin up a busybox container to send files in/out of the volume, busybox should spin up pretty fast
and there could be an interesting volume driver in this list: https://docs.docker.com/engine/extend/legacy_plugins/#volume-plugins
w

witty-crayon-22786

02/09/2023, 10:18 PM
i think volumes are only useful if we can mount them dynamically on a running container, which sounds like maybe a thing?
f

fast-nail-55400

02/09/2023, 10:19 PM
I don't recall
as an aside, buildbarn comes with a NFSv4 server backed by CAS now
👍 1
w

witty-crayon-22786

02/09/2023, 10:20 PM
cc @curved-television-6568, @enough-analyst-54434: anything else jump out at you?
e

enough-analyst-54434

02/09/2023, 10:27 PM
Well, IIUC the problem is not even understood yet. The ticket mentions races, but that requires - I don't know, async filesystems? Windows WSL uses plan9fs which is a network filesystem and very slow to transfer files / edits and does appear async. It would really help to have the theory of what the problem actually is nailed down in the ticket. At that point I could take a look. I don;t even have a clue how we use the exec. Is it write to a volume mount, then serially exec into a container that can see that volume mount? I'd guess so, but I'm just guessing, etc.
If you knew more about the filesystems and what they actually guaranteed you could write the files, then write a token file, then in the exec, use a shim binary that blocked on appearance of the token. If the FS was async but guaranteed order, that would work.
There are just a ton of things it seems like you could do, but knowing the real problem would really help pick.
w

witty-crayon-22786

02/09/2023, 10:34 PM
Is it write to a volume mount, then serially exec into a container that can see that volume mount? I’d guess so, but I’m just guessing, etc.
write to a directory that is bind-mounted into the container. on Linux the mounts are synchronous since there is no virtualization involved (uses overlay2, etc), but on macOS it has used a variety of strategies in the last few years (osxfs, grpc-fuse, virtiofs)
e

enough-analyst-54434

02/09/2023, 10:35 PM
Yeah, the latter part you mention in the ticket, but with no strong conclusion that those ~3 are async.
w

witty-crayon-22786

02/09/2023, 10:35 PM
it is entirely possible that
virtiofs
plus macOS’s new virtualization framework (which is still an experimental Docker feature) will resolve this
@enough-analyst-54434: well, it stems from osxfs having had lots of settings around consistency levels, which disappeared in grpc-fuse because “they’re not necessary”, but…
e

enough-analyst-54434

02/09/2023, 10:36 PM
I guess we have infini-slack now, but this seems like a real bad format for gathering a shared holisitc view of the question to even begin to answer,
Plowing on though, your bullets above mention 750ms mac, I did spend a good bit of effort not using Docker and getting 100ms starts in ~2018/2019 or so. I used crun + jq
Is ditching Docker an option?
w

witty-crayon-22786

02/09/2023, 10:38 PM
If you knew more about the filesystems and what they actually guaranteed you could write the files, then write a token file, then in the exec, use a shim binary that blocked on appearance of the token. If the FS was async but guaranteed order, that would work.
mm, yea: this is a good thought too. a particularly annoying bit about both
osxfs
and
grpc-fuse
is that they are closed source. and grpc-fuse is deprecating
osxfs
, but has drastically less documentation than
osxfs
did.
e

enough-analyst-54434

02/09/2023, 10:38 PM
The whole Mac WIndows thing is just super dumb - I agree on that. But I guess we have to support these super weird hoops for non macOS / Windows devs who use them anyhow.
w

witty-crayon-22786

02/09/2023, 10:39 PM
Is ditching Docker an option?
maybe, yea. but afaict, all of the macOS-containers systems have this same issue, and have been bouncing between implementations. they seem to be trending toward virtiofs. the first half of this article is good: https://www.cncf.io/blog/2023/02/02/docker-on-macos-is-slow-and-how-to-fix-it/
Plowing on though, your bullets above mention 750ms mac, I did spend a good bit of effort not using Docker and getting 100ms starts in ~2018/2019 or so. I used crun + jq
i fully expect that this is macOS doing virtualization rather than docker itself… latency is much lower on Linux, afaik
e

enough-analyst-54434

02/09/2023, 10:40 PM
Well that's future so we can't just wait. AFAICT your send a tar by network idea - presumably you need a shim binary to coordinate? - sounds like the most sane option when not fully understanding the problem.
Because clearly that can work
w

witty-crayon-22786

02/09/2023, 10:41 PM
re: your token and filesystem ordering idea… it’s pretty cheap to try. i just don’t have a bulletproof repro, so it might mean more 2.15.x rcs.
e

enough-analyst-54434

02/09/2023, 10:42 PM
I personally would not try until I understood the problem.
w

witty-crayon-22786

02/09/2023, 10:42 PM
Well that’s future so we can’t just wait. AFAICT your send a tar by network idea - presumably you need a shim binary to coordinate? - sounds like the most sane option when not fully understanding the problem.
no, no need for a shim binary: you can pipe directly into tar. the docker API allows for streaming stdin to a process
e

enough-analyst-54434

02/09/2023, 10:42 PM
IUt banks on a guess and passing the guess would itself be a guess
w

witty-crayon-22786

02/09/2023, 10:43 PM
I personally would not try until I understood the problem.
see above re: grpc-fuse being closed source… i’m not sure i will actually get an answer there
e

enough-analyst-54434

02/09/2023, 10:43 PM
Exactly
i fully expect that this is macOS doing virtualization rather than docker itself… latency is much lower on Linux, afaik
I was bringing - wild variablity - 500ms - 1s down to 100ms constant FWIW.
So it helped on Linux too.
👍 1
w

witty-crayon-22786

02/09/2023, 10:45 PM
virtiofs is open source (or at least portions of it are), so if it eventually becomes the default (which doesn’t sound guaranteed based on the trend of this thread: https://github.com/docker/roadmap/issues/7), then studying its implementation is an option
e

enough-analyst-54434

02/09/2023, 10:47 PM
It sounds to me like 2.15.x maybe needs to be made independent of this stuff? Releases have been dragging out lately - maybe holiday bias - but whatever path here it sounds like ~major surgery for an rc5
w

witty-crayon-22786

02/09/2023, 10:47 PM
it’s major surgery, but in the brand new headline feature of the release. i’m less worried about the magnitude of the change, and more worried about delaying things
…or not delaying them enough, i suppose
e

enough-analyst-54434

02/09/2023, 10:48 PM
"brand new headline feature of the release"
Ok, that sounds like marketing speak to me.
w

witty-crayon-22786

02/09/2023, 10:48 PM
yea, agreed.
e

enough-analyst-54434

02/09/2023, 10:48 PM
We don;t really do feature releases IIUC. Although we also do - we now blog about things, its a bit markety
w

witty-crayon-22786

02/09/2023, 10:49 PM
yea, it’s tricky.
the feature is marked experimental, which is supposed to be another way to disconnect releases from feature stability
but … i still want to try to meet some quality bar
e

enough-analyst-54434

02/09/2023, 10:51 PM
I guess as long as we're honest in the marketing, i.e don;t for this release or do but note its broken for Mac
h

happy-kitchen-89482

02/09/2023, 11:17 PM
We can change the headline, FWIW
w

witty-crayon-22786

02/10/2023, 12:40 AM
tar pipe approach is still feasible, but more challenging than i initially though, since without the bind mount it would need to be bidirectional: a pipe in for inputs, and a pipe out for outputs.
c

curved-television-6568

02/10/2023, 1:29 AM
cc @curved-television-6568, @enough-analyst-54434: anything else jump out at you?
nothing obvious, no.
w

witty-crayon-22786

02/13/2023, 10:48 PM
https://github.com/pantsbuild/pants/pull/18225 … using a tar pipe is about 30% slower than using a mount.
that makes me wonder whether a better approach would be to conditionally disable container caching instead… because at least container start time is a relatively fixed overhead.
…i think that i’m going to bang out a flag to disable the container cache as well, and then we can land the docker-strategy flag as a trinary option.
nevermind… not worth the effort yet. but i’ll leave space in the option name for additional strategies in the future.
argh! symlink support in Digests doesn’t exist in 2.15.x, so a significant portion of the infrastructure for https://github.com/pantsbuild/pants/pull/18225 isn’t available, and it can’t be picked cleanly. i’m going to revert it from
main
, and land something to disable the container cache instead. what a mess.
i’ve gone down the path of introspecting the files in the container to try and wait until they have been created. the “wait for a single file written after all other inputs were written” approach (the token approach that John suggested) was not successful, so i proceeded to exec’ing
stat
for all of the inputs.
(Tom’s repro was invaluable: thanks Tom)
the wild thing right now is that
stat
shows that all input files for a task exist, including files which the process claims don’t exist: https://gist.github.com/stuhood/6c78ab4a9e511b14df2c3247962604ee
i.e.
__pkgs__/internal_reflectlite/__pkg__.a
is claimed missing, but
stat
from within the container shows it as existing.
one unknown from looking at this though is that not all of these
.a
files have the same permissions… which is fishy, but i don’t know how it could result in a “no such file or directory” error
e

enough-analyst-54434

02/15/2023, 9:33 PM
Add to the fishy is the access / modify / change timestamp set. Very different from all others.
Seems like a good thing to drill on.
Also a super big file.
w

witty-crayon-22786

02/15/2023, 9:36 PM
yea… that seems like it could have something to do with the Link count. large files are now using hardlinks on
main
(but not on
2.15.x
… sigh)
that probably explains the permissions difference too.
e

enough-analyst-54434

02/15/2023, 9:38 PM
Is this just a dangling symlink in a tar then? Not fully following. The size stat says no.
w

witty-crayon-22786

02/15/2023, 9:39 PM
no tars here anymore: this is a bind mount, with a real file behind it
e

enough-analyst-54434

02/15/2023, 9:39 PM
Also claims regular file.
Ah, gotcha.
I mean ... stat is just inode IIUC, don;t know impl. If you had a perverse file system that flushed inode metadata before data blocks all hit disk and further reported not all data bllocks there as missing file ... crazy - not it.
w

witty-crayon-22786

02/15/2023, 9:41 PM
…true. sheesh. i suppose i could try opening everything.
e

enough-analyst-54434

02/15/2023, 9:41 PM
So ... can you remind me whether or not injecting a shim binary in the image is acceptable / whether local remexec can be used instead - with internal remexec binaries / mini cluster?
Obviously big change of mechanism.
w

witty-crayon-22786

02/15/2023, 9:44 PM
it would be acceptable… but i’m not sure it is in scope
e

enough-analyst-54434

02/15/2023, 9:44 PM
Well, everything is broken so scope may have to creep to ship unbroken software for Mac - but yeah - definitely bigger scope.
f

fast-nail-55400

02/15/2023, 9:48 PM
You could also consider a shim binary not for REAPI but just for un-tar'ing a input root sent over
an in-container Pants supervisor process
w

witty-crayon-22786

02/15/2023, 9:49 PM
that’s what the
tar-pipe
change did.
supervisor was
tar
, heh
f

fast-nail-55400

02/15/2023, 9:49 PM
right, but you can run the in-container process all the time to avoid having to
docker exec
it
more of an optimization on the technique really
separately, https://virtio-fs.gitlab.io/ is experimental in docker for macOS 4.6 as a replacement for gRPC-FUSE
w

witty-crayon-22786

02/15/2023, 9:51 PM
yea. unfortunately, i’d need to upgrade my macOS to use it. @fast-nail-55400: are you able to try it and see whether it repros?
e

enough-analyst-54434

02/15/2023, 9:51 PM
Totally left field is a custom volume driver. But I don't know the API or if it could be warped for this sort of thing.
Say that did work @witty-crayon-22786 is asking others to do so on the table?
w

witty-crayon-22786

02/15/2023, 9:52 PM
absolutely: for an experimental feature, absolutely.
e

enough-analyst-54434

02/15/2023, 9:53 PM
Well, think ahead
How long must that block?
experimental forever?
w

witty-crayon-22786

02/15/2023, 9:53 PM
the tar-pipe solution will be viable in 2.16.x (although still slower presumably)
e

enough-analyst-54434

02/15/2023, 9:54 PM
I thought that was not a solution though? output files?
f

fast-nail-55400

02/15/2023, 9:54 PM
I'll give the virtofs a try
w

witty-crayon-22786

02/15/2023, 9:54 PM
thank you
@enough-analyst-54434: the reason tar-pipe was reverted is that it can’t be cherry-picked to 2.15.x… not (necessarily) because it didn’t work. i didn’t have Tom’s repro at the point when i merged it
e

enough-analyst-54434

02/15/2023, 9:55 PM
Ok. We did have a known output files bug waiting though also IIUC
w

witty-crayon-22786

02/15/2023, 9:55 PM
unconfirmed though… gedanken, if you will.
because Tom’s repro case is very good… you get a repro basically every time. but never for outputs
e

enough-analyst-54434

02/15/2023, 9:56 PM
Ok. I am super uncomfortable with gedanken + promo-flash-sale but I can back off that. I'm a bit alone there I think.
f

fast-nail-55400

02/15/2023, 10:12 PM
virtiofs seems to help. I didn't see a failure from missing files. (Although now I see a Go-specific error from the specific test which happens on subsequent invocations so probably not the race condition.)
Copy code
ProcessExecutionFailure: Process 'Link Go binary: ./package_analyzer' failed with exit code 1.
stdout:
loadinternal: cannot find runtime/cgo

stderr:
2023/02/15 22:10:13 reference to undefined builtin "runtime.duffzero" from package "runtime"
w

witty-crayon-22786

02/15/2023, 10:15 PM
hmmmm
can you … try a few times? i.e. with
--no-local-cache
?
f

fast-nail-55400

02/15/2023, 10:16 PM
sure
ok just happens less frequently
got:
/usr/local/go/src/syscall/env_unix.go:12:2: could not import runtime (open __pkgs__/runtime/__pkg__.a: no such file or directory)
w

witty-crayon-22786

02/15/2023, 10:19 PM
thank you. and this is most recent macOS with virtiofs?
f

fast-nail-55400

02/15/2023, 10:20 PM
macOS 13.2.1 (22D68)
Docker 4.16.2 (95914)
x86
ran 3 times, 1 failed due to missing files, the other two failed due to a Go backend bug with environments likely
(
./pants_from_sources --no-local-cache package race:racy_docker
)
👍 1
scie-pants 0.5.1
w

witty-crayon-22786

02/15/2023, 10:22 PM
an unfortunate update from the filesystem probing front is that John’s hunch about
stat
vs
open
was partially correct:
wc -c
will report “No such file or directory” in some cases, and then eventually stabilize on agreeing that all files exist. but then
go
will still fail to find some inputs.
e

enough-analyst-54434

02/15/2023, 10:22 PM
My god.
yea. i’m about out of rope here.
…actually. two more things to guess and check at: 1. disabling hardlinking, 2. trying mounting sandboxes in a non-
tmp
filesystem.
@fast-nail-55400: thanks again for trying that.
👍 1
f

fast-nail-55400

02/15/2023, 10:29 PM
an alternative to Docker Desktop on MacOS is Lima: https://github.com/lima-vm/lima
(and it includes containerd support)
another alternative is https://github.com/abiosoft/colima
w

witty-crayon-22786

02/15/2023, 10:30 PM
afaik, they use virtiofs as well now…? but who knows where the consistency disconnect is
sonofagun. no repro with hardlinking disabled. but WTH… we don’t hardlink on
2.15.x
, so i don’t understand how folks were seeing this there.
!? … @enough-analyst-54434: can you imagine there having been hardlinks involved in the original report: https://github.com/pantsbuild/pants/issues/18162 …? perhaps something to do with how PEX invokes itself?
yea, confirmed… no repro across 3 or 4 runs of the
go
case with hardlinks disabled (by raising this limit), and no extra filesystem synchronization. so main question now is: how did someone observe this in 2.15.x, given that we hadn’t started hardlinking things at that point
even if we can’t really imagine a case for PEX with hardlinks, i had been considering moving the docker
named_caches
into a volume anyway, so might be willing to guess and check there
e

enough-analyst-54434

02/15/2023, 11:06 PM
Pex will hardlink files under certain conditions from the Pex cache, but not the venv PEX
pex
script. The original issue has not enough info to determine what
./pex
is afaict. Is that the Pex PEX? If so, that's a PEX file materialized by the engine.
w

witty-crayon-22786

02/15/2023, 11:07 PM
yes: it’s the pex-pex: there is an expander in the description with more info
e

enough-analyst-54434

02/15/2023, 11:08 PM
Ok, well what concoction do you have in mind? PEX hasn't even executed yet, right?
./pex
is not found
w

witty-crayon-22786

02/15/2023, 11:10 PM
i’m looking in
src/python/pants/backend/python/util_rules/pex_cli.py
now, but… we must be invoking it as
python ./pex
…?
e

enough-analyst-54434

02/15/2023, 11:10 PM
Yup
So, I think that's a fish and a miss
w

witty-crayon-22786

02/15/2023, 11:12 PM
i’m not sure yet… the error looks pretty different on my machine, so i wonder whether there are wrappers in there
Copy code
$ python3 ./does-not-exist
/Users/stuhood/.pyenv/versions/3.6.10/bin/python3: can't open file './does-not-exist': [Errno 2] No such file or directory
i mean, PEX is itself a venv PEX, right? and it will re-exec?
e

enough-analyst-54434

02/15/2023, 11:12 PM
I honestly don't remember.
But re-exec happens for zipapp too. So not the right question really.
I'm headed out for a few hours.
w

witty-crayon-22786

02/15/2023, 11:14 PM
see you later
thanks for talking this through
…you know what, the original file would have to exist. otherwise, how would it get into a “<frozen zipimport>” codepath?
this is not definitive, but it feels like enough for me to justify making the named cache a volume rather than a bind mount, since i had already been looking at that.
…ah!!! the original report was in
2.16.0.dev5
! damnit
let me look at the other one.
it was on
main
as well! so that’s it then. holy crap.
e

enough-analyst-54434

02/16/2023, 1:36 AM
One thing I'm lost on still is what it is. Turning hard links off fixes, but why does having them break? Have you sussed that @witty-crayon-22786?
w

witty-crayon-22786

02/16/2023, 1:39 AM
My guess is that the virtual filesystems in play here don't handle them well. But no real idea why. Will probably open an issue with docker if I can isolate it further.
e

enough-analyst-54434

02/16/2023, 1:42 AM
Ok. One thing to note about PEX hardlinking is it's all small stuff save for the odd mega-.so
6 Views