One thing I did notice in v2 is that while you can modulate Pants #general

One thing I did notice in v2, is that, while you c...

numerous-fall-96475

05/07/2020, 5:32 PM

One thing I did notice in v2, is that, while you can modulate ram usage somewhat with the local_parallelism settings, if you happen to call the binary goal with a wildcard like ::, on a full cache miss, it'll load a massive amount of stuff into memory right before outputting all files seemingly at once, proportional to the number of targets. This snuck up on us and we had to make the CI box a bit more beefy. So that may be an area of improvement, unless there's a 'please only use this much RAM' switch I wasn't able to spot in the help section.

💡 2

☝️ 1

aloof-angle-91616

05/07/2020, 5:34 PM

yes, lmdb will also occasionally allocate too much shared memory as well and occasionally it will open too many files

aloof-angle-91616

05/07/2020, 5:35 PM

i'm separately working on virtualizing process executions so we don't ever need to actually touch the filesystem which would make this go away

aloof-angle-91616

05/07/2020, 5:35 PM

that's not via a sandbox though, it's via modifying whatever pants subprocesses to use a virtual fs API. so not super generic and would require effort to make work with everyone's code right now

aloof-angle-91616

05/07/2020, 5:35 PM

a "please only use this much RAM" switch sounds extremely reasonable

enough-analyst-54434

05/07/2020, 5:38 PM

@aloof-angle-91616 does this mean you're picking up work on fs/brfs ?

aloof-angle-91616

05/07/2020, 5:41 PM

brfs would be the generic solution and would be great to use as the final piece in the puzzle. i was first working on making an API that shares file content via shared memory (and directories via the pants lmdb store): https://github.com/cosmicexplorer/upc/. this is so that a virtual git filesystem can send files to pants instantly. my goal for the first prototype is to "virtualize" scrooge by making it not write to real files by editing its code, which i want to use to demonstrate the performance of the virtualization approach before i dive into a more generic solution e.g. with brfs.

enough-analyst-54434

05/07/2020, 6:00 PM

Ok.

aloof-angle-91616

05/07/2020, 6:13 PM

i'm entirely making it up as i go along at this point and haven't presented it to twitter folks yet except the git virtual filesystem owner (kaushik) who thinks it is a very cool idea. i think that the performance will almost definitely work out as desired. we will see.

aloof-angle-91616

05/07/2020, 6:16 PM

i just saw your merge. i'll try to get up to speed.

aloof-angle-91616

05/07/2020, 8:46 PM

the sbt maintainer appears to have had the literal exact same idea about virtualizing i/o at the application level, and he happened to be in a great place to make that change in zinc. https://twitter.com/eed3si9n/status/1258139122977431554?s=21

witty-crayon-22786

05/07/2020, 8:58 PM

that looks like a remote cache

witty-crayon-22786

05/07/2020, 8:58 PM

looks like he calls out that pants already does that?

aloof-angle-91616

05/07/2020, 8:58 PM

Both Scala compiler and Java compiler is able handle an abtraction notion of virtual file. Rather than manipulating the state of Zinc, I think it's better if we can do away with the idea of using working-directory specific absolute paths during compilation. For large-scale build tools, this facility can be used for example to keep all sources in-memory. Furthermore, keeping a bunch of java.io.File with full absolute paths could add up.

aloof-angle-91616

05/07/2020, 8:59 PM

he’s discussing virtualizing all i/o by editing the application code

aloof-angle-91616

05/07/2020, 8:59 PM

he mentions pants nailgun as an example of persistent processes improving perf i believe

witty-crayon-22786

05/07/2020, 8:59 PM

mm. got it.

aloof-angle-91616

05/07/2020, 9:00 PM

i’m hacking away on cargo rules right now but this weekend i want to steal that work since it happens to bridge the precise gap between the upc library and the prototype i wanted

witty-crayon-22786

05/07/2020, 9:00 PM

but, ftr: lmdb does not allocate shared memory: rather, it MMAPs things, which means that they’re flushable.

aloof-angle-91616

05/07/2020, 9:00 PM

yes, the upc library uses shmget() for files though :)

witty-crayon-22786

05/07/2020, 9:00 PM

@numerous-fall-96475: it’s unclear what the memory usage is: haven’t profiled yet, but plan to via https://github.com/pantsbuild/pants/issues/9395

👍 1

aloof-angle-91616

05/07/2020, 9:01 PM

good segue

aloof-angle-91616

05/07/2020, 9:04 PM

it’s possible that the shared memory part is unnecessary too. the only benefit right now is that it’s incredibly fast and simple to allocate or retrieve because i made a horrible lock-free allocator. but it might be removable. thanks for pointing out that lmdb already does a lot of the work here.

Open in Slack

Previous Next