My local `lmdb_store` directory is currently at 46...
# general
b
My local
lmdb_store
directory is currently at 46GiB, which is a fairly hefty proportion of my full disk. AIUI, pantsd is meant to be doing some GC on it, but, if so, there's a lot of non-garbage. Can I introspect the GC process? Is there a way to force a (harder) GC?
c
rm -rf ~/.cache/pants
?
w
.pants.d/pants.log
should record a GC every 4 hours, iirc
b
Haha, maybe slightly less forciful GC than
rm
😅 I'll have a look at the log, thanks
😁 1
could/should be a
tokio
task at this point
b
I see various lines like
Copy code
08:22:09.39 [INFO] Garbage collecting store. target_size=28,800,000,000
08:22:11.56 [INFO] Done garbage collecting store
I'll increase the log level and see what else comes out of it
w
that’s the relevant one
b
Ooh, looks like I could potentially also use the separate fs_util tool to force one (and with
ShrinkBehavior::Compact
which sounds more aggressive than
ShrinkBehavior::Fast
🤔 )
w
yeeep
Compact is not natively supported though… it requires closing/recreating the database
would definitely welcome a patch that figured out how/where to do that safely
b
ah, yeah, I see it copying to separate files
anyways, thanks for the tips, should be enough for me to work out if our repo is just that huge, or if there's something else going
w
LMDB has served its purpose fairly well, but it’s not my favorite. not being async compatible, not having a compacting GC, occasional corruption, needing to shard, etc
b
Don't worry, @broad-processor-92400 soon your large files will just exist on disk 😤
🤞 2
w
true. that will extend the lifetime of LMDB a bit =)
yea, assuming that the reason you haven’t seen things GC’d is in fact the Compact vs Fast distinction
b
Although the "large" files won't be GC'd I think. So maybe worse for you 😅
w
they’ll need to be … haven’t added that feedback to the PR yet =x
b
Ruh roh
f
just run
tmpwatch
on the large file directory assuming atime is updated
w
yea… should be possible to do it in the exact same loop as the LMDB store… just create timestamps from atimes
b
I'm more worried about collecting while the file is still symlinked 😳
w
GC handles that
by not collecting things that are reachable from memory
b
Yaaaay... I think
b
Getting very distant from my original question but... Are atimes reliable enough for this purpose? I was under the impression some file systems either don't support them, and/or can be configured to be pretty relaxed about updating them
w
if not, can “touch” to bump the mtime instead
👍 1
b
Even with
-ltrace
, it seems there's no additional logging beyond the
INFO
above, ah well. The fact that https://github.com/pantsbuild/pants/blob/3304f13aecd534f5581b35104ad77bea41809b5d/src/rust/engine/fs/store/src/lib.rs#L1099-L1106 didn't trigger (i.e. the GC successfully reduced the reported size below 28.8GB) would suggest that indeed this might be a fragmentation issue, since 46GB is a little larger than 28.8GB. I'll try a
ShrinkBehavior::Compact
GC to confirm.
yeah, the same GC via
fs_util gc --target-size-bytes 28800000000
resulted in it the directory being the expected size