from a downstream conversation with <@U6YPB4SJX> a...
# development
a
from a downstream conversation with @average-vr-56795 about lmdb gc with pantsd active (also noting that currently we don't do any gc without pantsd active, which i think i'm fine with)
dwagnerhall [00:53]
FWIW the LMDB garbage collection we do is also kind of crappy because it doesn’t necessarily compact… Not sure of a good strategy for doing some background garbage collection without blocking user action…
dmcclanahan [Today at 00:54]
? is blocking user action necessary for compacting?
9 replies
dwagnerhall [8 hours ago]
Garbage collecting at all requires an exclusive lock on the store, so pants will be unable to do anything
dwagnerhall [8 hours ago]
Compacting requires copying the entries we want to keep to a new directory, deleting the existing one, and renaming over the top, which then requires resetting the Store in any process which has it open, which basically is a pantsd restart if you’re using pantsd
dmcclanahan [8 hours ago]
can we shard with a partition key of some sort to avoid a global store lock? i do not know what renaming is referring to unless you just mean moving the file that keeps the data and then resetting to point to the new file
dwagnerhall [8 hours ago]
We already shard 16 ways, but fundamentally, garbage collecting 1/16 of the store is still going to stop pants from being able to do anything while that 1/16 is locked
dwagnerhall [8 hours ago]
And yes, the renaming is for the backing file that keeps the data
dwagnerhall [8 hours ago]
The lmdb format doesn’t allow for compacting when you delete, so the way we “compact” is by iterating over the entries to decide which ones we want to keep, then copying those to a new dir, and deleting the old one
dmcclanahan [8 hours ago]
i don't immediately have a solution for that
dmcclanahan [6 minutes ago]
also forgot that we shard exactly like that already i remember the ShardedLmdb code was so slick
dmcclanahan [1 minute ago]
ok so i’m confused as to how the method you describe requires blocking pants — are we ever deleting anything from the db except for garbage collection? can we do the copying without blocking and only block on the rename? and it seems intrinsically like there should be a way to reset the store without restarting pantsd, not that i know what i’m talking about...
and also @witty-crayon-22786 we could maybe at least send the user a message if pantsd isn't activated after the store reaches a certain size saying "hey, the lmdb store is getting pretty big! pantsd garbage collects the store, so you may want to consider turning that on!"
a
We probably shouldn’t suggest people turn pantsd on given it’s know-buggy… But it would be trivial to add a a
garbage-collect-lmbd-store
goal…
👌 1
There are two things that necessitate some kind of synchronisation:
a
(i would be in favor of that and yes not super ideal until we can add a test harness for pantsd and be confident about it)
a
1. The process of working out what to delete requires an exclusive lock, because we need to work out what is and isn’t in use in the graph (and so what we can and can’t delete)
2. Actually doing the delete and rename requires the caller to re-create their
Store
w
One time cleanup when pantsd goes on by default would also be easy, and my vote vs compaction
(for the near term)
a
I agree with that, but also, there’s still the “garbage collection doesn’t compact” issue…
a
(1) makes sense, although i have suspicions that can likely be assuaged by looking at the code
(2) also makes sense in light of (1)
a
2 is unrelated to 1
2 is about stale fds
a
ok
why does recreating the store imply an exclusive lock? does the graph hold cursors that are invalidated if the fd is retaken?
a
It’s not quite an exclusive lock
One pants process needs to be able to signal to all others that they need to reset their stores
a
ok
a
And that kind of needs to be a “Notify I’m about to delete it”, “Notify you can recreate your Stores” kind of protocol
Which is effectively a lock 🙂
a
ok then we are on the same page as far as my understanding at this second goes