One thing I did notice in v2, is that, while you c...
# general
n
One thing I did notice in v2, is that, while you can modulate ram usage somewhat with the local_parallelism settings, if you happen to call the binary goal with a wildcard like ::, on a full cache miss, it'll load a massive amount of stuff into memory right before outputting all files seemingly at once, proportional to the number of targets. This snuck up on us and we had to make the CI box a bit more beefy. So that may be an area of improvement, unless there's a 'please only use this much RAM' switch I wasn't able to spot in the help section.
💡 2
☝️ 1
a
yes, lmdb will also occasionally allocate too much shared memory as well and occasionally it will open too many files
i'm separately working on virtualizing process executions so we don't ever need to actually touch the filesystem which would make this go away
that's not via a sandbox though, it's via modifying whatever pants subprocesses to use a virtual fs API. so not super generic and would require effort to make work with everyone's code right now
a "please only use this much RAM" switch sounds extremely reasonable
e
@aloof-angle-91616 does this mean you're picking up work on fs/brfs ?
a
brfs would be the generic solution and would be great to use as the final piece in the puzzle. i was first working on making an API that shares file content via shared memory (and directories via the pants lmdb store): https://github.com/cosmicexplorer/upc/. this is so that a virtual git filesystem can send files to pants instantly. my goal for the first prototype is to "virtualize" scrooge by making it not write to real files by editing its code, which i want to use to demonstrate the performance of the virtualization approach before i dive into a more generic solution e.g. with brfs.
e
Ok.
a
i'm entirely making it up as i go along at this point and haven't presented it to twitter folks yet except the git virtual filesystem owner (kaushik) who thinks it is a very cool idea. i think that the performance will almost definitely work out as desired. we will see.
i just saw your merge. i'll try to get up to speed.
the sbt maintainer appears to have had the literal exact same idea about virtualizing i/o at the application level, and he happened to be in a great place to make that change in zinc. https://twitter.com/eed3si9n/status/1258139122977431554?s=21
w
that looks like a remote cache
looks like he calls out that pants already does that?
a
Both Scala compiler and Java compiler is able handle an abtraction notion of virtual file. Rather than manipulating the state of Zinc, I think it's better if we can do away with the idea of using working-directory specific absolute paths during compilation. For large-scale build tools, this facility can be used for example to keep all sources in-memory. Furthermore, keeping a bunch of java.io.File with full absolute paths could add up.
he’s discussing virtualizing all i/o by editing the application code
he mentions pants nailgun as an example of persistent processes improving perf i believe
w
mm. got it.
a
i’m hacking away on cargo rules right now but this weekend i want to steal that work since it happens to bridge the precise gap between the upc library and the prototype i wanted
w
but, ftr: lmdb does not allocate shared memory: rather, it MMAPs things, which means that they’re flushable.
a
yes, the upc library uses shmget() for files though :)
w
@numerous-fall-96475: it’s unclear what the memory usage is: haven’t profiled yet, but plan to via https://github.com/pantsbuild/pants/issues/9395
👍 1
a
good segue
it’s possible that the shared memory part is unnecessary too. the only benefit right now is that it’s incredibly fast and simple to allocate or retrieve because i made a horrible lock-free allocator. but it might be removable. thanks for pointing out that lmdb already does a lot of the work here.