Can I run multiple processes in the same sandbox? ...
# plugins
g
Can I run multiple processes in the same sandbox? For workflows like running an OCI image with runc it needs to be unpacked on top of the local fs, which looks very odd when inspected from the outside. If it's unpacked to
/tmp/sandbox/image/fs
there'll be symlinks that are relative to that fs root, like this:
/tmp/sandbox/image/fs/lib/foo.2.3 -> /lib/foo.2
. Trying to output this from a process makes Pants understandably barf.
Copy code
Error expanding output globs: Failed to read link "/tmp/pants-sandbox-Qipcu3/unpacked_image/rootfs/lib64/ld-linux-x86-64.so.2": Absolute symlink: "/lib/x86_64-linux-gnu/ld-2.31.so"
Optimally unpack + run would be a single step; but I can't see how I can use
Process
to achieve that without writing scripts.
h
It’s one sandbox per Process, so you’d have to write a wrapper script to invoke unpack and then run, I guess.
You can generate this script from a template and write it into a digest all in memory
f
There are a number of examples of this in the repo including
setup_go_sdk_process
and the Coursier jar resolution rules
g
Yeah, I'll look at that. Might make a feature request to be able to fuse processes into one sandbox. Would be quite ergonomic.
f
side note: Pants does not yet support capture of symlinks from an execution sandbox. https://github.com/pantsbuild/pants/pull/16844 is the WIP PR to support sandbox-relative symlinks. Absolute symlinks would need different work.
g
In this case I don't want any symlink resolution. I want the whole tree to be kept as is. :P
(but a script is better here since it's just an intermediate)
f
the whole tree as an archive or captured as a
Digest
into the Pants cache?
the latter will fail due to lack of symlink support for any symlinks anywhere in an
output_directories
or
output_files
on
Process
g
Wouldn't the same issue occur for the return digest of unpacking an archive?
f
when unpacking Pants ends up writing them as normal files
so technically no symlinks
this has been a problem for other users hence the PR linked above
when capturing the symlinks will be seen as files and not symlinks and captured that way
then when you use that
Digest
later it'll just be as files
g
Yeah, but what I'm saying is that it's impossible for that to happen, since the symlink only has a valid destination inside the rootfs of the container. So it has to be kept as an invalid symlink throughout the process.
So any PR made to make symlink handling work in some specific way is not going to work for this because the symlink cannot ever point to something valid when Pants looks at it. And if it did intersect with a host-system file, then that'd poison the container image if pants used the host file instead.
f
There seems to be some miscommunication here. The validity or invalidity of the symlink isn't really an issue here. Pants essentially captures the directory listing so it can reproduce the fileystsem tree at a later point. Pants will just refuse to capture symlinks currently. Moreover, the PR linked above is only to support sandbox-relative symlinks (whether valid or invalid). Symlinks that point to an absolute path will just cause Pants to error if the
Process
tries to capture them as an output.
So you will probably need a wrapper script to unpack an archive file into the sandbox and then put the filesystem tree back into an archive when done. Then capture the archive with the
Process
.
Pants will just refuse to capture symlinks currently.
Or for valid symlinks only, read through to the destination file and capture that content instead.
And I do understand your point why the symlinks need to remain invalid and absolute. Here at Toolchain, our prototype of a remote execution product has the same issue because we deal with OCI images for running build actions in.
g
Yeah; I had a look. I think if the goal of that PR is to capture symlinks verbatim, then that'll work for absolute symlinks too in my case. My interpretation of
when unpacking Pants ends up writing them as normal files
was that a captured symlink would be replaced with its content. But if you mean it'll be written back to disk as a regular file but flagged as a symlink, then that sounds like what I want. 🙂 Just need to make it work for absolute links. OTOH (and this'll be problematic for image builds too) - it isn't unfeasible that we'll have some massive trees generated from this. The largest container build I run at work today comes out to a total of 9GB, squashed, cleansed, purged and compressed.
f
separately Pants may not work well with input/output files at the GB scale. See https://github.com/pantsbuild/pants/issues/16697 where a user had an issue with a pex build at ~2GB size.
g
Yeah; for sure. It does take quite some painful design decisions to get a container to that size, and I don't expect any framework to handle it gracefully - It's a very custom process even with buildah.
Looking at my "average" build outside of the chunky ML runners they're all < 100 Mb total/flat size, so that should be a lot more manageable, and a sane target to deal with.
h
A feature request for a “MultiProcess” of some kind would be appropriate, I think
With the understanding that the MultiProcess would be cacheable, but not its constituent Processes
g
Yepp! I'll set up an issue after work today 🙂