<@U0N6C2Q9F>, <@U04S45AHA>: a thought about the `i...
# development
w
@fast-nail-55400, @enough-analyst-54434: a thought about the
immutable_input_digests
api before we lean in further on porting things to use it
it occurred to me that it could likely instead be an includelist of paths which could be treated as immutable, but which were expected to already be in the digest
so
Copy code
input_digest: Digest
immutable_paths: set[str]
then we’d adjust materialization to materialize as immutable (i.e., symlink) if it encountered an immutable path
f
that would require running
MergeDigests
though on the all of the digests regardless though?
versus avoiding that work if some of the subdirs are immutable
that said, materialization is what dominates sandbox setup so merging to produce the root
Directory
proto shouldn’t be too much of a performance issue
^^ makes assumptions, please confirm/reject assumptions
h
Fwit, that was the API I was originally envisioning because it makes it more ergonomic for rule authors to use the feature, which I think we want to encourage
w
that would require running 
MergeDigests
 though on the all of the digests regardless though?
yes, but we do anyway in
InputDigests
to validate that there are no collisions
we’d also discussed using a merged digest to do materialization anyway
@hundreds-father-404: yea, i think that it would be.
h
Would paths be files and/or dirs, like the new REAPI
output_paths
field?
w
afaict, it doesn’t impact the ability to mount an immutable path… as long as we never introduce a “mutable paths” field which we’d then have to validate as not overlapping
@hundreds-father-404: shouldn’t matter either way, afaict
that sounds like rough agreement.
@hundreds-father-404: if you have any interest in working on this, it might be a nice diversion from the lockfile stuff. it’s lower priority, but would be good to do before https://github.com/pantsbuild/pants/issues/14070
f
I’m fine with whatever the API is. If the merging is necessary for correctness any way, no objection from me on what is proposed.
h
I'd love to have a diversion from lockfile stuff hehe. Will finish unifying the lockfile goals first though
I started looking into this. Iiuc, we change
<http://local.rs:prepare_workdir|local.rs:prepare_workdir>
. We'd need to subset the `input_digest`: partition into mutable vs immutable, right? And then
store.materialize_directory()
as we do now on the mutable digest, but still
symlink()
on each
immutable_path
? If that's the case, I'm confused how we handle multiple distinct immutable input digests. My read of
<http://immutable_inputs.rs|immutable_inputs.rs>
is that we intentionally separate out distinct digests like
__toolcp
vs
__processorcp
, which each get their own
path()
. Those each might have n files, but they're grouped by that digest's same
path()
to the dir. We'd need to start having one path-per-file iiuc, as we no longer can group related files since it's a flattened list?
w
We’d need to subset the `input_digest`: partition into mutable vs immutable, right? And then 
store.materialize_directory()
 as we do now on the mutable digest, but still 
symlink()
 on each 
immutable_path
?
no need to subset in
input_digest
i don’t think. because
materialize_directory
will take the whole digest, and then symlink on a case-by-case basis
If that’s the case, I’m confused how we handle multiple distinct immutable input digests. My read of 
<http://immutable_inputs.rs|immutable_inputs.rs>
 is that we intentionally separate out distinct digests like 
__toolcp
 vs 
__processorcp
, which each get their own 
path()
.
we wouldn’t have multiple distinct immutable input digests anymore: we’d go back to just
input_digest
the idea would be for the caller to lay out the directory structure (via `MergeDigest`/prefix/etc) as they were before (…oops), but maintaining the directory based nesting that we have now
they’d then mark directories as immutable
h
Oh, so there would still be distinct directories? That makes more sense. And then is it everything in and under each
immutable_dir
is symlinked? Meaning, we do not need to change to processing per-file. We can still do per-directory
And then is it everything in and under each
immutable_dir
is symlinked?
if
input_digest
has
a/f1.txt
,
a/f2.txt
,
a/sub/f.txt
, and
b/f.txt
, and you have
immutable_dirs=['a']
, then
materialize_directory
will see to symlink
a/f1.txt
,
a/f2.txt
, and
a/sub/f.txt
together?
w
And then is it everything in and under each 
immutable_dir
 is symlinked?
when you symlink a path, you’re done operating under that path, because all of the content under that path is provided (immutably) by the symlink
so in your example, you need exactly one symlink for
a
👍 1
h
But we do need to know to not try writing via conventional methods those 3 files, right? Or perhaps symlink everything first and then only write if not already present in the sandbox
w
no, because
materialize_directory
operates recursively. when you hit the path in question, you would symlink, and then not recurse any further
1
there is a bit of an odd interplay here because
materialize_directory
will be checking for paths that it should symlink, but it won’t know know the digest to symlink until here: https://github.com/pantsbuild/pants/blob/e77191184f66e866a7282f21f40f6ad652566d62/src/rust/engine/fs/store/src/lib.rs#L993-L997
h
I'm not certain I'll be able to work more on this this week, lots of writing I have queued up for blogs. We might want to land the Pex immutable_digests improvement now without blocking on this? We already have 15+ uses of immutable_digests iirc, doesn't seem that big of a deal to have 20-30 when we make this refactor. The performance gains for users is worth the pain of refactoring more code
w
yep, that’s fine with me.
h
cool, I'll add that this week