I m musing a challenge we have might be one of our last ones Pants #development

I'm musing a challenge we have (might be one of ou...

bitter-ability-32190

06/22/2022, 2:24 PM

I'm musing a challenge we have (might be one of our last ones for migration!) One of our services is packaged using files on disk on dev boxes which are gitignored (a command generates them). Trying to

files()

them causes Pants to treat them as individual targets and then

finding all targets

takes forever for those who have the files locally. There's maybe a few ways we can handle this, but my first thought is "what if we had a target type which represented the files as one target?" Like a

directory

target?

bitter-ability-32190

06/22/2022, 2:27 PM

This might actually work! It'd be a

FileSourceField

generator (or resource. damn you distinction!) Honestly it doesn't even have to a

directory

, it just as easily could be a

lazy_files

lazy_resources

bitter-ability-32190

06/22/2022, 3:49 PM

I wonder how long it will take to fingerprint 100k files 🤮

witty-crayon-22786

06/22/2022, 4:17 PM

filesystems suck, heh. you might also see whether the files can be zipped/tar’d and then extracted

bitter-ability-32190

06/22/2022, 4:18 PM

Yeah thats my backup

bitter-ability-32190

06/22/2022, 4:28 PM

Error launching process: Os { code: 7, kind: ArgumentListTooLong, message: "Argument list too long" }

lol

bitter-ability-32190

06/22/2022, 4:28 PM

It's only 100k files, cmon!

witty-crayon-22786

06/22/2022, 4:28 PM

heh. iirc, zip allows input files from stdin

witty-crayon-22786

06/22/2022, 4:29 PM

Copy code

-@ file lists.  If a file list is specified as -@ [Not on MacOS], zip takes the list of input files from standard input instead of from the command line.  For example,

              zip -@ foo

bitter-ability-32190

06/22/2022, 4:29 PM

Do we use that tho?

witty-crayon-22786

06/22/2022, 4:29 PM

oh. sorry, i thought that you were writing a

@rule

. need more error context

bitter-ability-32190

06/22/2022, 4:30 PM

I'd love to provide more, but my terminal buffer is just a list of the files 😂 then maybe 20 lines of the bottom of the error JSON 😛

bitter-ability-32190

06/22/2022, 4:31 PM

I'm almost certain this is not the path I want to go down 😛

bitter-ability-32190

06/22/2022, 5:02 PM

Copy code

12:01:53.49 [WARN] Error storing process execution result to local cache: Error storing fingerprints ["000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20"]: MDB_MAP_FULL: Environment mapsize limit reached - ignoring and continuing

lol

witty-crayon-22786

06/22/2022, 5:04 PM

… i think that that means that you hit https://www.pantsbuild.org/docs/reference-global#section-local-store-directories-max-size-bytes …?

witty-crayon-22786

06/22/2022, 5:05 PM

16GB worth of directory entries

bitter-ability-32190

06/22/2022, 5:05 PM

Yeah, I think archiving is the path forward for us 😵

witty-crayon-22786

06/22/2022, 5:05 PM

which you can increase, but …

bitter-ability-32190

06/22/2022, 5:12 PM

What slice of my work would make this occur? Right now I have a custom target with a codegen rule, in the rule I

PathGlobs("bigfatdir/*")

into a snaptshot. I think the dir is like 1.5GB in size

bitter-ability-32190

06/22/2022, 5:13 PM

As a zip it'd be 900-ish MB

witty-crayon-22786

06/22/2022, 5:14 PM

pantsd

periodically garbage collects the store to attempt to keep the content at a fraction of the configured max size (16GB for directory entries, 256GB for files)

witty-crayon-22786

06/22/2022, 5:15 PM

directories go in

~/.cache/pants/lmdb_store/directories

witty-crayon-22786

06/22/2022, 5:15 PM

on my machine (for the repositories that i build), the ratio between files and directories is 14G to 38M

witty-crayon-22786

06/22/2022, 5:17 PM

the limit is fairly arbitrary. but either: 1)

pantsd

hasn’t getting a chance to garbage collect recently, 2) you captured some very, very large directories such that you bursted all the way to the limit.

witty-crayon-22786

06/22/2022, 5:17 PM

is there anything odd about how you are using

pantsd

witty-crayon-22786

06/22/2022, 5:19 PM

and what is the output of

find bigfatdir | wc

bitter-ability-32190

06/22/2022, 5:20 PM

101262  <tel:1012624759269|101262 4759269>

bitter-ability-32190

06/22/2022, 5:21 PM

Nothing pantsd odd

witty-crayon-22786

06/22/2022, 5:22 PM

and i assume that

du -h ~/.cache/pants/lmdb_store/directories

is roughly 16G ?

witty-crayon-22786

06/22/2022, 5:23 PM

yea, if the filenames are collectively only 4.7MB it’s pretty unlikely that that would cause you to hit that limit

witty-crayon-22786

06/22/2022, 5:23 PM

very weird…

bitter-ability-32190

06/22/2022, 5:30 PM

`987M /home/joshuacannon/.cache/pants/lmdb_store/directories```

witty-crayon-22786

06/22/2022, 5:32 PM

ok, then: super-extra weird. i think that that MAP_FULL error is a lie… some other error masquerading as “map full”. do you think that you could adjust that error message to also include the value length? https://github.com/pantsbuild/pants/blob/6564898b02016af70e2c50215955a40a2d7f62a7/src/rust/engine/sharded_lmdb/src/lib.rs#L424-L431

witty-crayon-22786

06/22/2022, 5:33 PM

something like

Copy code

diff --git a/src/rust/engine/sharded_lmdb/src/lib.rs b/src/rust/engine/sharded_lmdb/src/lib.rs
index 2a925c33fc..9145d6382d 100644
--- a/src/rust/engine/sharded_lmdb/src/lib.rs
+++ b/src/rust/engine/sharded_lmdb/src/lib.rs
@@ -425,7 +425,7 @@ impl ShardedLmdb {
                 "Error storing fingerprints {:?}: {}",
                 batch
                   .iter()
-                  .map(|(key, _)| key.to_hex())
+                  .map(|(key, value)| format!("{} len: {}", key.to_hex(), value.len()))
                   .collect::<Vec<_>>(),
                 e
               )

bitter-ability-32190

06/22/2022, 5:36 PM

Ah, can't use

main

, as then I'd have to upgrade my plugins and that aint easy 😅 I'm behind on my version upgrades

bitter-ability-32190

06/22/2022, 5:39 PM

34G     /home/joshuacannon/.cache/pants/lmdb_store/files

hmmm

bitter-ability-32190

06/22/2022, 5:40 PM

I can wipe the cache and try this one operation and see what comes out the other end

witty-crayon-22786

06/22/2022, 5:40 PM

that should be fine. the relevant method only stores directories, so i’m pretty sure that this is directory entry related.

witty-crayon-22786

06/22/2022, 5:40 PM

but yea, feel free. i expect that the error is lying, and that it has to do with the item size. so wiping is unlikely to affect it

bitter-ability-32190

06/22/2022, 5:54 PM

oh yeah I suppose that makes sense. Wont wipe then

witty-crayon-22786

06/22/2022, 6:01 PM

another shot in the dark: did you adjust https://www.pantsbuild.org/docs/reference-global#section-local-store-shard-count ?

bitter-ability-32190

06/22/2022, 6:12 PM

, no but I can for testing

witty-crayon-22786

06/22/2022, 6:12 PM

nah, don’t worry about it. the default should mean that the maximum size of a directory is 1GB

bitter-ability-32190

06/22/2022, 6:13 PM

Hmmm maybe:

Copy code

954M    /home/joshuacannon/.cache/pants/lmdb_store/directories/5

Is a problem then?

witty-crayon-22786

06/22/2022, 6:13 PM

shouldn’t be: max size for a single directory

bitter-ability-32190

06/22/2022, 6:13 PM

Wel if it was going to write more than 50-ish MB what woul happen?

witty-crayon-22786

06/22/2022, 6:14 PM

would be fine: https://www.pantsbuild.org/docs/reference-global#section-local-store-directories-max-size-bytes is 16GB

witty-crayon-22786

06/22/2022, 6:14 PM

ooooh, i see what you’re saying.

witty-crayon-22786

06/22/2022, 6:14 PM

Copy code

/home/joshuacannon/.cache/pants/lmdb_store/directories/5

is only one shard… missed that above

bitter-ability-32190

06/22/2022, 6:14 PM

can I wipe that shard and try again? Is that kosher?

witty-crayon-22786

06/22/2022, 6:15 PM

what are all of the sizes in there?

bitter-ability-32190

06/22/2022, 6:15 PM

Mostly 2ish MB except that one

witty-crayon-22786

06/22/2022, 6:16 PM

wow… ok. so yea, this was a single huge directory entry then probably.

bitter-ability-32190

06/22/2022, 6:16 PM

I think the dir is like 1.5GB in size

😉

witty-crayon-22786

06/22/2022, 6:16 PM

um… it might be fine to wipe only one? probably safer to move it aside

witty-crayon-22786

06/22/2022, 6:17 PM

@bitter-ability-32190: the directory entries only contain file names and digests… (not even the full path: just the name)

bitter-ability-32190

06/22/2022, 6:17 PM

bitter-ability-32190

06/22/2022, 6:18 PM

moving the shard an re-running worked

bitter-ability-32190

06/22/2022, 6:19 PM

Now that shard has 12M

witty-crayon-22786

06/22/2022, 6:23 PM

ok. more points against LMDB. these limits can be adjusted, but it’s a pain in the ass both that we have to shard, and that we have to set a fixed max size. bumped an aggregator ticket on the topic.

witty-crayon-22786

06/22/2022, 6:24 PM

if you want to pursue that path, you can adjust the size limit and/or the shard count. i’m still curious why that directory entry is so large, so if you do end up getting a chance to add debug output, it would be appreciated.

bitter-ability-32190

06/22/2022, 6:24 PM

At a minimum we have 101k filename+digest combos 🤷‍♂️

witty-crayon-22786

06/22/2022, 6:24 PM

probably easier though is to see if you can find the single largest directory (in terms of direct/non-recursive children)

bitter-ability-32190

06/22/2022, 7:19 PM

We're just gonna zip up the relevant files. It's much saner that way 🙂

bitter-ability-32190

06/22/2022, 8:20 PM

Well I've learned Pants doesn't like >100k targets 😛

witty-crayon-22786

06/22/2022, 9:31 PM

Yet ™️

❤️ 1

Open in Slack

Previous Next