Any thoughts if REAPI would consider adding an ignore mechan Pants #development

Any thoughts if REAPI would consider adding an ign...

hundreds-father-404

11/04/2021, 8:13 PM

Any thoughts if REAPI would consider adding an ignore mechanism to

output_paths

, like

output_paths=("dir/", "!dir/ignore_me")

? A substantial slowdown for our Go support is loading all the downloaded modules. We need to capture from the Go process all of

pkg/mod

, but we would be safe to ignore

pkg/mod/cache

. On my machine, ~20% of the size of my downloaded Go modules is from that folder

hundreds-father-404

11/04/2021, 8:18 PM

A substantial slowdown for our Go support is loading all the downloaded modules.

I found in a trace last week that loading the

process.output_digest

from LMDB was taking several seconds. This is causing target generation to be really slow for Go when it's not memoized already

witty-crayon-22786

11/04/2021, 8:37 PM

could you relocate the cache instead?

hundreds-father-404

11/04/2021, 8:38 PM

What do you mean? Using

SnapshotSubset

etc? Or changing the

Process

(e.g. argv and env vars) to output differently?

witty-crayon-22786

11/04/2021, 8:38 PM

fwiw, i do think that our experience with gitignore style excludes has been really positive, so they might be accepted as an extension. but i also haven’t been involved there for a while. maybe @average-vr-56795 has thoughts

👍 1

witty-crayon-22786

11/04/2021, 8:38 PM

What do you mean? Using
SnapshotSubset
etc? Or changing the
Process
(e.g. argv and env vars) to output differently?

the latter. reconfiguring the cache location.

average-vr-56795

11/05/2021, 12:54 AM

I suspect people would look at it a bit funny, but probably be ok with it?

👍 1

hundreds-father-404

11/05/2021, 12:56 AM

Do you know if Bazel suffers from large digests slowing them down too? I'm still filing the ticket for it, but it was ~4 seconds to load the digest from LMDB iirc

witty-crayon-22786

11/05/2021, 1:44 AM

Large directory structures, or large files? High counts of either files or directories tends to be a bottleneck moreso than total size

average-vr-56795

11/05/2021, 8:02 AM

Internally not so much - they've put a lot of work into optimising how they store and manipulate structures to minimise that overhead (e.g. one of the core bazel data structures is a "nested set" - a set which is a union of other sets) so that they can pass around and handle references to things like "the files in a go distribution" from a bunch of actions with very low overhead

average-vr-56795

11/05/2021, 8:04 AM

Their file transfer code between client and server is pretty unoptimised, which causes some mild woe, but only the first time each file is uploaded somewhere, so it amortises pretty well. But if you're the first person to build some go on a cluster, it's not going to be amazing for you

average-vr-56795

11/05/2021, 8:08 AM

There's actually also a lot of overhead from the fact that the representations are different - converting nested sets (really optimised for "these things are related") into merkle trees (really optimised for "this representation is canonical and consistent") is also surprisingly overhead-y

average-vr-56795

11/05/2021, 8:15 AM

And has been discussion about changing the REAPI format to allow a more nested-set-like representation, though it never goes anywhere because there are still optimisations that can be done in the existing implementation

average-vr-56795

11/05/2021, 8:17 AM

The other big difference for local execution is that actions are all run in symlink forests not places with re-materialised files

average-vr-56795

11/05/2021, 8:17 AM

(Which probably has similar performance characteristics in practice to the "symlink out to the go toolchain" stuff Tom was designing)

fast-nail-55400

11/05/2021, 10:57 AM

could you relocate the cache instead?

or maybe run a

rm -rf path/to/cache

before returning from the

Process

invocation in which you are capturing that path?

hundreds-father-404

11/05/2021, 3:46 PM

Thank you Daniel! And oh, good idea Tom!! We're already using a bash script to run Go code, so that is really easy to do

witty-crayon-22786

11/05/2021, 4:56 PM

@average-vr-56795: mm, yea. funny you should mention DepSets… we were looking at that yesterday. we’ve used a similar concept in a few places (TransitiveTarget, CoarsenedTarget most recently), but Eric hit a case yesterday with a recursive structure, and it’s becoming clearer why it is generalized in Bazel

➕ 1

witty-crayon-22786

11/05/2021, 4:57 PM

and due (in-part) to their docs, i realized the connection to

Directory

only this morning: https://github.com/pantsbuild/pants/issues/13112#issuecomment-962027101

witty-crayon-22786

11/05/2021, 5:00 PM

so maybe the answer to https://github.com/pantsbuild/pants/issues/13112 is actually a generic recursive ordered set (maybe in Rust, to ease porting the filesystem operations on #13112)

👍 1

8 Views

Open in Slack

Previous Next