My local `lmdb store` directory is currently at 46GiB which Pants #general

My local `lmdb_store` directory is currently at 46...

broad-processor-92400

02/15/2023, 9:36 PM

My local

lmdb_store

directory is currently at 46GiB, which is a fairly hefty proportion of my full disk. AIUI, pantsd is meant to be doing some GC on it, but, if so, there's a lot of non-garbage. Can I introspect the GC process? Is there a way to force a (harder) GC?

curved-television-6568

02/15/2023, 9:38 PM

rm -rf ~/.cache/pants

witty-crayon-22786

02/15/2023, 9:38 PM

.pants.d/pants.log

should record a GC every 4 hours, iirc

broad-processor-92400

02/15/2023, 9:39 PM

Haha, maybe slightly less forciful GC than

rm

😅 I'll have a look at the log, thanks

😁 1

witty-crayon-22786

02/15/2023, 9:40 PM

https://github.com/pantsbuild/pants/blob/ac9e27b142b14f079089522c1175a9e380291100/src/python/pants/pantsd/service/store_gc_service.py#L18-L35

witty-crayon-22786

02/15/2023, 9:40 PM

could/should be a

tokio

task at this point

broad-processor-92400

02/15/2023, 9:41 PM

I see various lines like

Copy code

08:22:09.39 [INFO] Garbage collecting store. target_size=28,800,000,000
08:22:11.56 [INFO] Done garbage collecting store

I'll increase the log level and see what else comes out of it

witty-crayon-22786

02/15/2023, 9:41 PM

that’s the relevant one

broad-processor-92400

02/15/2023, 9:42 PM

Ooh, looks like I could potentially also use the separate fs_util tool to force one (and with

ShrinkBehavior::Compact

which sounds more aggressive than

ShrinkBehavior::Fast

🤔 )

witty-crayon-22786

02/15/2023, 9:42 PM

yeeep

witty-crayon-22786

02/15/2023, 9:42 PM

Compact is not natively supported though… it requires closing/recreating the database

witty-crayon-22786

02/15/2023, 9:42 PM

would definitely welcome a patch that figured out how/where to do that safely

broad-processor-92400

02/15/2023, 9:44 PM

ah, yeah, I see it copying to separate files

broad-processor-92400

02/15/2023, 9:44 PM

anyways, thanks for the tips, should be enough for me to work out if our repo is just that huge, or if there's something else going

witty-crayon-22786

02/15/2023, 9:45 PM

LMDB has served its purpose fairly well, but it’s not my favorite. not being async compatible, not having a compacting GC, occasional corruption, needing to shard, etc

bitter-ability-32190

02/15/2023, 9:46 PM

Don't worry, @broad-processor-92400 soon your large files will just exist on disk 😤

🤞 2

witty-crayon-22786

02/15/2023, 9:46 PM

true. that will extend the lifetime of LMDB a bit =)

witty-crayon-22786

02/15/2023, 9:47 PM

yea, assuming that the reason you haven’t seen things GC’d is in fact the Compact vs Fast distinction

bitter-ability-32190

02/15/2023, 9:47 PM

Although the "large" files won't be GC'd I think. So maybe worse for you 😅

witty-crayon-22786

02/15/2023, 9:48 PM

they’ll need to be … haven’t added that feedback to the PR yet =x

bitter-ability-32190

02/15/2023, 9:51 PM

Ruh roh

fast-nail-55400

02/15/2023, 10:01 PM

just run

tmpwatch

on the large file directory assuming atime is updated

witty-crayon-22786

02/15/2023, 10:01 PM

yea… should be possible to do it in the exact same loop as the LMDB store… just create timestamps from atimes

bitter-ability-32190

02/15/2023, 10:01 PM

I'm more worried about collecting while the file is still symlinked 😳

witty-crayon-22786

02/15/2023, 10:02 PM

GC handles that

witty-crayon-22786

02/15/2023, 10:02 PM

by not collecting things that are reachable from memory

bitter-ability-32190

02/15/2023, 10:02 PM

Yaaaay... I think

broad-processor-92400

02/15/2023, 11:40 PM

Getting very distant from my original question but... Are atimes reliable enough for this purpose? I was under the impression some file systems either don't support them, and/or can be configured to be pretty relaxed about updating them

witty-crayon-22786

02/15/2023, 11:40 PM

if not, can “touch” to bump the mtime instead

👍 1

broad-processor-92400

02/16/2023, 3:24 AM

Even with

-ltrace

, it seems there's no additional logging beyond the

INFO

above, ah well. The fact that https://github.com/pantsbuild/pants/blob/3304f13aecd534f5581b35104ad77bea41809b5d/src/rust/engine/fs/store/src/lib.rs#L1099-L1106 didn't trigger (i.e. the GC successfully reduced the reported size below 28.8GB) would suggest that indeed this might be a fragmentation issue, since 46GB is a little larger than 28.8GB. I'll try a

ShrinkBehavior::Compact

GC to confirm.

broad-processor-92400

02/16/2023, 3:59 AM

yeah, the same GC via

fs_util gc --target-size-bytes 28800000000

resulted in it the directory being the expected size

4 Views

Open in Slack

Previous Next