I’m looking at `RunRequest` and related infrastruc...
# development
a
I’m looking at
RunRequest
and related infrastructure at the moment — is there a particular blocker around refactoring it to use a
Process
object rather than specifying raw args? Currently constructing a valid JVM process is complicated, so I’d love to make use of the
JvmProcess
infrastructure I used previously. (pinging @witty-crayon-22786 in particular since he’s seen the
JvmProcess
work thus far)
w
not … that i’m aware of…?
but note that rather than a
Process
, it will likely need to be an
InteractiveProcess
👍 1
which has slightly different constraints.
a
You can construct an
InteractiveProcess
from a
Process
, most of the time
that’s certainly what we’re doing over in
junit.py
w
sure. just suggesting that it should probably be the
@rule
authors constructing the
InteractiveProcess
, to push considering those constraints to them
a
OK, that makes sense
@witty-crayon-22786 How much do we care about
run
executing things in a chroot workspace? Currently our JVM bootstrapper needs to
ln
a
JAVA_HOME
directory, which is done inline with running the relevant executable, but the chroot approach would probably involve some sort of two-step that we don’t currently have a good model for
w
mm.
a
I can think about the two-step (currently I’m working on using
RunRequest
, but
Process
doesn’t have a good model for this either
w
um, the
ln
was mostly supposed to support providing a stable location for the JDK to be referenced from within a command. if there is another way to accomplish that, then a chroot might not be necessary.
but… i’m not sure that you can actually avoid a chroot…? we have to materialize all of the other inputs to the process too. so there will be a temporary workdir
a
OK. I’m thinking that we might need to add a “preparation args” to
RunRequest
so that we can run the preparation code
so we run that and output the results into a digest of some description
w
i’m not sure that the preparation code is actually any different from the other code being run. the big difference with
run
is mostly that the CWD may not be equal to your temporary directory
i.e.: all the temporary stuff is in $dir1, and my cwd is $dir2
a
Where does the
java_home
come from then?
Because we have to link it from something
w
where the symlink would go is the same: it’s still inside the tempdir. the challenge is probably that we don’t have a relative path to the tempdir, and thus to the location of the symlink
a
that’s assuming we run before chrooting, right?
w
so… basically, it affects all use of relative paths in process startup, i think.
☝️ 1
i think that java_home might just be the first instance of this.
a
I think we can get a relative location for the tempdir, it’s just doing the preparation before the chroot gets invoked that is the complication
w
yea. it’s interesting that none of python, go, or docker needed this
(but maybe not surprising… because we end up using self-contained binaries for them)
@ancient-vegetable-10556: could we craft a shell/python script to be the actual executable for
run
?
and then that script could prepare the arguments…?
i suppose that the issue is that if a
Process
has been written using relative paths, absolutizing them is challenging without its support
a
The executable for run would be a shell script, but we’d still need to yoink the JVM into the chroot hierarchy before entering the chroot
unless I’m seriously misunderstanding something
yes, the location of JAVA_HOME is affected. but so are all other relative paths, unfortunately. i think that basically everything needs to be absolutized, but how isn’t clear.
a
It’s a relative path to something that lives somewhere completely outside of the hierarchy, in this case, right?
so basically, to work as a RunRequest, a Process needs all relative paths absolutized by the template variable…
java_home/bin
becomes
{chroot}/java_home/bin
, etc.
a
Right, but at the moment, we get the java home by asking Coursier where that java home is
w
@ancient-vegetable-10556: that doesn’t need to change. all that needs to change is that the location that we symlink it to needs to be made absolute via the
{chroot}
template variable
(afaict)
can take a look at
src/python/pants/backend/python/goals/run_pex_binary.py
for comparison… it basically prepends
{chroot}
to everything relevant
a
Right, but we have to run that before we enter the chroot, right? Because the jdk isn’t inside the chrooted directory structure. Or are we linking in the relevant external binaries some other way here?
w
@ancient-vegetable-10556: no
coursier java-home
can run anywhere, and will emit an absolute path to a JDK
a
Does that jdk need to exist on the system already?
w
no: the call will fetch it if it needs to, into a cache directory
a
great
w
i started symlinking it from the absolute location to a location inside the sandbox in order to avoid needing templating in most JDK commands… they could just expect a symlink to exist in the sandbox already
a
ok
that all works in theory. It looks like
RunRequest
will make a thing that can run, but it downloads a JDK every time, so when I come back to this, I’ll need to figure out why that’s the case.
w
that is likely because the named_caches aren’t symlinked properly, OR the env vars aren’t set to use them
a
oh yeah, that’s definitely the case, I did the bare minimum to get it to run without rewriting
RunRequest
before I go back and make it efficient
@witty-crayon-22786 So I’m getting this error as I start to wire up the jdk caches:
Copy code
ValueError: InteractiveProcess requested setup of append-only caches and also requested to run in the workspace. These options are incompatible since setting up append-only caches would modify the workspace.
Would it make sense to pre-populate a JDK into the right place and dump that into the workspace? Or should we be trying to make the cache mechanism work more generally?
w
…hm. other folks (Benjy, Tom, Eric) worked with this more recently, and there are a few dimensions. but i do think that we might need to adjust the assumption there…
the InteractiveProcess isn’t aware of the temporary directory that the run goal is creating.
(i think…?)
a
I’m not sure what “aware of” means in this situation? on my machine at least, I’m able to
cd
over to the temporary directory and do things there
basically, the run goal is creating a temporary directory somewhere, and then running the InteractiveProcess somewhere else
a
I am, in fact, working in that part of code
w
right. so what i mean by “aware of” is just that the run goal is creating a temporary directory, but the InteractiveProcess machinery doesn’t know about it.
the cache symlinks (and any other setup) would need to be created in that temporary directory: not in the workspace
it sortof seems like all of the
{chroot}
templating and temporary directory creation could move inside of InteractiveProcess running?
because the current location is a bit of a hack
@ancient-vegetable-10556: does that make sense?
a
Potentially — I’ve actually avoided the
{chroot}
templating at all, by running
Copy code
cd `dirname $0`
in the root
InteractiveProcess
w
mm. that will break usage i think.
the idea behind running in the workspace is that something like:
Copy code
./pants run $target -- a/relative/arg for/my/process
…should work
people will expect that the process is running in the current directory.
a
makes sense
I’m really trying to get this working piecemeal because the
run
machinery is delicate and I don’t yet understand it 🙂
w
yea. it took a while to page in, but it’s becoming clearer to me i think.
1. the
{chroot}
templating is necessary if relative paths will be used, afaict 2. the
{chroot}
templating and temporary directory creation moving into
InteractiveProcess
(maybe as prework?) would likely clarify all of this a lot
a
fwiw, it’s not entirely clear to me what’s actually chrooted here (I think I can see the full filesystem inside the process?)
w
“chroot” is a misnomer here. it’s just the temporary directory containing all of the inputs
even moreso because the actual cwd of the process is something else when
run_in_workspace=True
.
a
ok, that makes sense
a
(do we actually generally do chrooting in other cases?)
w
every process we run runs in a temporary directory with a limited env… but we’re not using any os-level features to prevent breaking out.
@ancient-vegetable-10556: regarding the “setup the sandbox” portion of interactive process (linked above): if you look at it, it bears a lot of similarity to https://github.com/pantsbuild/pants/blob/331352e4e788cd967be0c22074a450696b0cb6f5/src/rust/engine/process_execution/src/local.rs#L605-L626, which is what is used to prepare the sandbox for a non-interactive process
if we wanted to support
immutable_inputs
,
append_only_caches
, etc on
InteractiveProcess
, then adjusting the interactive process runner code to use
prepare_workdir
would go a long way
a
OK. Maybe I can find some time to pair with you on this later on? I’ll adjust my focus to getting
run
to work more generally (Java first, then Scala)
w
works for me