i'm hacking away on a framework (<https://github.c...
# development
a
i'm hacking away on a framework (https://github.com/cosmicexplorer/upc) to share memory between processes on a local machine so that i can delegate filesystem operations in pants to a virtual git filesystem (VCFS) from twitter that i'm trying to open source. there's nothing there right now but my plan is to make it really easy for: (1) _virtual filesystems_: to communicate their contents with pants (2) _command-line tools_: to be modified to read file contents from shared memory instead of the filesystem (3) _build tools_: to have super high-resolution tracing on the subprocesses they invoke without having to build in zipkin themselves, to replay nondeterministic failures from a remote machine! might go nowhere!! but i have a prototype of the shared memory part working and i think it can be made generic so that we can plug arbitrary subprocesses into our virtual file system and still have them read everything at high speed, without taking up any temp dir space! let me know if that intersects with any work anyone else is doing anytime this year!
😎 1
w
neat.
it overlaps a bit with brfs, so make sure you've taken a look at that: https://github.com/pantsbuild/pants/blob/master/src/rust/engine/fs/brfs/src/main.rs#L661
no plans to push on that soon since i haven't really profiled that portion of the code
a
i'm looking at brfs now
w
it's a FUSE filesystem that daniel wrote a while back. read-only from a snapshot.
a
ok fantastic
thank you for pointing me to that!
i've been able to cut away a lot of superfluous unnecesarily complex communication since thinking about this last night
w
mounting it below a union/overlayfs would allow for reading through from the snapshot, and writing out to something else.
a
ah, ok
i'm currently trying to see how much porting effort it would take to get scrooge to access a fully virtual filesystem (by editing the codebase), the idea being if i can virtualize file i/o, then i can avoid having to materialize or digest any files at all.
w
in the workspace, you mean?
a
yes
w
one of the things that we had discussed was exposing a cheap "what is the digest of this path" operation
a
yes
that was one of the things i made a lot simpler after thinking about it last night, but still up in the air (still drawing diagrams with arrows at this point)
right now i'm thinking that VCFS can write directly to the LMDB store by depending on the engine binary, then return a Digest (via thrift) to pants, so i think that is aligned with what you've just mentioned above
the thrift part works already, integrating the LMDB store via the engine crate i haven't tried yet
w
there is prior art for exposing digests via filesystem metadata too
which can make it more generic.
a
i hadn't realized that, thank you for the tip
sorry, what kind of prior art?
or do you mean in brfs as well
w
a
!!! haven't seen that word in a while
hm. that also sounds like a way to address https://github.com/pantsbuild/pants/issues/9428 (can't materialize symlinks)
(in that case we were able to work around neediing to materialize symlinks, though)
w
a bit on the prior art in bazel https://github.com/bazelbuild/bazel/issues/923
a
this is a very interesting issue
thank you again!
w
(can search for xattr in that thread)
🌈 1
a
i'm gonna spend a bit of time today prototyping the jvm library to virtualize i/o in-memory and seeing whether that's prohibitively slow for scrooge in the monorepo. i'm really interested in using the expected output files/directories from the EPR to make any file operations outside of the desired output just no-op, for example
i might get nowhere. but brfs is super helpful as a bridge between the hacks i have on top of VCFS and what i want to see, thanks again
will ask for help or just questions, etc
w
what's the connection between scrooge and VCFS? that wouldn't be in the workspace probably... would be in a sandbox