https://pantsbuild.org/ logo
b

bitter-ability-32190

06/22/2022, 2:24 PM
I'm musing a challenge we have (might be one of our last ones for migration!) One of our services is packaged using files on disk on dev boxes which are gitignored (a command generates them). Trying to
files()
them causes Pants to treat them as individual targets and then
finding all targets
takes forever for those who have the files locally. There's maybe a few ways we can handle this, but my first thought is "what if we had a target type which represented the files as one target?" Like a
directory
target?
This might actually work! It'd be a
FileSourceField
generator (or resource. damn you distinction!) Honestly it doesn't even have to a
directory
, it just as easily could be a
lazy_files
or
lazy_resources
I wonder how long it will take to fingerprint 100k files 🤮
w

witty-crayon-22786

06/22/2022, 4:17 PM
filesystems suck, heh. you might also see whether the files can be zipped/tar’d and then extracted
b

bitter-ability-32190

06/22/2022, 4:18 PM
Yeah thats my backup
Error launching process: Os { code: 7, kind: ArgumentListTooLong, message: "Argument list too long" }
lol
It's only 100k files, cmon!
w

witty-crayon-22786

06/22/2022, 4:28 PM
heh. iirc, zip allows input files from stdin
Copy code
-@ file lists.  If a file list is specified as -@ [Not on MacOS], zip takes the list of input files from standard input instead of from the command line.  For example,

              zip -@ foo
b

bitter-ability-32190

06/22/2022, 4:29 PM
Do we use that tho?
w

witty-crayon-22786

06/22/2022, 4:29 PM
oh. sorry, i thought that you were writing a
@rule
. need more error context
b

bitter-ability-32190

06/22/2022, 4:30 PM
I'd love to provide more, but my terminal buffer is just a list of the files 😂 then maybe 20 lines of the bottom of the error JSON 😛
I'm almost certain this is not the path I want to go down 😛
Copy code
12:01:53.49 [WARN] Error storing process execution result to local cache: Error storing fingerprints ["000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20"]: MDB_MAP_FULL: Environment mapsize limit reached - ignoring and continuing
lol
w

witty-crayon-22786

06/22/2022, 5:04 PM
16GB worth of directory entries
b

bitter-ability-32190

06/22/2022, 5:05 PM
Yeah, I think archiving is the path forward for us 😵
w

witty-crayon-22786

06/22/2022, 5:05 PM
which you can increase, but …
b

bitter-ability-32190

06/22/2022, 5:12 PM
What slice of my work would make this occur? Right now I have a custom target with a codegen rule, in the rule I
PathGlobs("bigfatdir/*")
into a snaptshot. I think the dir is like 1.5GB in size
As a zip it'd be 900-ish MB
w

witty-crayon-22786

06/22/2022, 5:14 PM
pantsd
periodically garbage collects the store to attempt to keep the content at a fraction of the configured max size (16GB for directory entries, 256GB for files)
directories go in
~/.cache/pants/lmdb_store/directories
on my machine (for the repositories that i build), the ratio between files and directories is 14G to 38M
the limit is fairly arbitrary. but either: 1)
pantsd
hasn’t getting a chance to garbage collect recently, 2) you captured some very, very large directories such that you bursted all the way to the limit.
is there anything odd about how you are using
pantsd
?
and what is the output of
find bigfatdir | wc
?
b

bitter-ability-32190

06/22/2022, 5:20 PM
101262  <tel:1012624759269|101262 4759269>
Nothing pantsd odd
w

witty-crayon-22786

06/22/2022, 5:22 PM
and i assume that
du -h ~/.cache/pants/lmdb_store/directories
is roughly 16G ?
yea, if the filenames are collectively only 4.7MB it’s pretty unlikely that that would cause you to hit that limit
very weird…
b

bitter-ability-32190

06/22/2022, 5:30 PM
`987M /home/joshuacannon/.cache/pants/lmdb_store/directories```
w

witty-crayon-22786

06/22/2022, 5:32 PM
ok, then: super-extra weird. i think that that MAP_FULL error is a lie… some other error masquerading as “map full”. do you think that you could adjust that error message to also include the value length? https://github.com/pantsbuild/pants/blob/6564898b02016af70e2c50215955a40a2d7f62a7/src/rust/engine/sharded_lmdb/src/lib.rs#L424-L431
something like
Copy code
diff --git a/src/rust/engine/sharded_lmdb/src/lib.rs b/src/rust/engine/sharded_lmdb/src/lib.rs
index 2a925c33fc..9145d6382d 100644
--- a/src/rust/engine/sharded_lmdb/src/lib.rs
+++ b/src/rust/engine/sharded_lmdb/src/lib.rs
@@ -425,7 +425,7 @@ impl ShardedLmdb {
                 "Error storing fingerprints {:?}: {}",
                 batch
                   .iter()
-                  .map(|(key, _)| key.to_hex())
+                  .map(|(key, value)| format!("{} len: {}", key.to_hex(), value.len()))
                   .collect::<Vec<_>>(),
                 e
               )
b

bitter-ability-32190

06/22/2022, 5:36 PM
Ah, can't use
main
, as then I'd have to upgrade my plugins and that aint easy 😅 I'm behind on my version upgrades
34G     /home/joshuacannon/.cache/pants/lmdb_store/files
hmmm
I can wipe the cache and try this one operation and see what comes out the other end
w

witty-crayon-22786

06/22/2022, 5:40 PM
that should be fine. the relevant method only stores directories, so i’m pretty sure that this is directory entry related.
but yea, feel free. i expect that the error is lying, and that it has to do with the item size. so wiping is unlikely to affect it
b

bitter-ability-32190

06/22/2022, 5:54 PM
oh yeah I suppose that makes sense. Wont wipe then
w

witty-crayon-22786

06/22/2022, 6:01 PM
b

bitter-ability-32190

06/22/2022, 6:12 PM
, no but I can for testing
w

witty-crayon-22786

06/22/2022, 6:12 PM
nah, don’t worry about it. the default should mean that the maximum size of a directory is 1GB
b

bitter-ability-32190

06/22/2022, 6:13 PM
Hmmm maybe:
Copy code
954M    /home/joshuacannon/.cache/pants/lmdb_store/directories/5
Is a problem then?
w

witty-crayon-22786

06/22/2022, 6:13 PM
shouldn’t be: max size for a single directory
b

bitter-ability-32190

06/22/2022, 6:13 PM
Wel if it was going to write more than 50-ish MB what woul happen?
ooooh, i see what you’re saying.
Copy code
/home/joshuacannon/.cache/pants/lmdb_store/directories/5
is only one shard… missed that above
b

bitter-ability-32190

06/22/2022, 6:14 PM
can I wipe that shard and try again? Is that kosher?
w

witty-crayon-22786

06/22/2022, 6:15 PM
what are all of the sizes in there?
b

bitter-ability-32190

06/22/2022, 6:15 PM
Mostly 2ish MB except that one
w

witty-crayon-22786

06/22/2022, 6:16 PM
wow… ok. so yea, this was a single huge directory entry then probably.
b

bitter-ability-32190

06/22/2022, 6:16 PM
I think the dir is like 1.5GB in size
😉
w

witty-crayon-22786

06/22/2022, 6:16 PM
um… it might be fine to wipe only one? probably safer to move it aside
@bitter-ability-32190: the directory entries only contain file names and digests… (not even the full path: just the name)
b

bitter-ability-32190

06/22/2022, 6:17 PM
ah
moving the shard an re-running worked
Now that shard has 12M
w

witty-crayon-22786

06/22/2022, 6:23 PM
ok. more points against LMDB. these limits can be adjusted, but it’s a pain in the ass both that we have to shard, and that we have to set a fixed max size. bumped an aggregator ticket on the topic.
if you want to pursue that path, you can adjust the size limit and/or the shard count. i’m still curious why that directory entry is so large, so if you do end up getting a chance to add debug output, it would be appreciated.
b

bitter-ability-32190

06/22/2022, 6:24 PM
At a minimum we have 101k filename+digest combos 🤷‍♂️
w

witty-crayon-22786

06/22/2022, 6:24 PM
probably easier though is to see if you can find the single largest directory (in terms of direct/non-recursive children)
b

bitter-ability-32190

06/22/2022, 7:19 PM
We're just gonna zip up the relevant files. It's much saner that way 🙂
Well I've learned Pants doesn't like >100k targets 😛
w

witty-crayon-22786

06/22/2022, 9:31 PM
Yet ™️
❤️ 1