some other prior art is Build XL by microsoft, but i don’t know if they got it working reliably on other platforms
10/03/2020, 12:56 PM
yes! one thing they also did there was to only e.g. trace file locations known in advance
they didn't seem to have a good explanation for this when i asked at a conference, except that i assume it was easier/more reliable to implement (which is a good reason)
one thing i of course love about the remexec/`Process` execution framework in pants is that we do know all the expected output files/dirs. so i feel like we should absolutely be able to use our omniscience to prefetch all file/directory reads/writes if we used the VFS approach. i do not know yet (hope to investigate today) whether the linux FUSE driver gives the right hook points to make that work performantly, or if it would be at all necessary to look into a kernel module (which would be fun, but obviously nonportable).
i'll ask kaushik at twitter -- he said he was considering rewriting OSXFUSE because the "parallel" version has tons of contention, for example (and the maintainer was described as extremely hard to work with, in a very specific way). so if we find that that's necessary to make a performant fast VFS on OSX, it at least makes the concept of a kernel module on linux more reasonable. but i will look into existing solutions, especially any rust crates that already do parts of this.
i am just excited about the possible performance of a VFS with prefetching for all writes. it feels extremely powerful and not unreasonable to implement with our great great process execution model. and i haven't heard so far of others doing this at all. i think bazel has a sandbox ish thing, and facebook's xar had a read-only filesystem thing, so will check those out first.
(just some thoughts on VFS stuff, no need to read too closely)