Hey Pants Team, been seeing OOM errors in CI recen...
# general
r
Hey Pants Team, been seeing OOM errors in CI recently with warnings such as
Copy code
[2022-11-10T18:49:23Z] 18:49:23.75 [WARN] Error storing process execution result to local cache: Error storing fingerprints ["000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20"]: MDB_CURSOR_FULL: Internal error - cursor stack limit reached - ignoring and continuing
the only major change I can think of is we pull the cache in our lint step. Not sure if that somehow impacted.
w
yikes, sorry for the trouble. are the caches being fetched from a host that is using the same platform?
and if you purge the cache once (i.e. bump to a brand new cache base key), does the problem recur?
r
yeah that is correct. This runs on CI agents using the same AMI and variable is instance type
and if you purge the cache once (i.e. bump to a brand new cache base key), does the problem recur? (edited)
let me try this on the lint step.
w
LMDB has been ok, but we have seen this occur in some rare cases, where folks ended up with corrupted caches
r
that could be the case here
w
it’s worth noting that using a remote cache doesn’t suffer from this problem… although obviously we would like to fix it (likely by eventually replacing LMDB).
r
we are going through an upgrade now (from 2.7) and first time I'm seeing this issue. We maintain a single master cache. Is that recommended? For mypy we build a cache per commit and pull the latest based on base commit
w
We maintain a single master cache. Is that recommended?
recommended would be using remote caching, but a single local cache should be ok.
r
huh removing cache in the lint stage did not help
so no cache same error
w
hm… you’re sure that it is starting with an empty store?
~/.cache/pants/lmdb_store
in particular?
r
wait sorry
one sec
okay apologies - removing the cache from the lint stage seems to be helping.
w
i would suggest permanently changing your cache key… i.e., bumping a counter value in the key
because it should definitely be safe to use the cache this way: i suspect one-time/rare corruption.
but yea: highly recommend the remote cache. it avoids all of this manual tuning, and is faster.
👍 1
r
yeah remote caching sounds great
thank you for the help on this @witty-crayon-22786 I will report if anything else weird occurs
w
sure thing!
h
Sorry for the trouble on this. It’s rare for lmdb_store to get in a bad state, but not impossible…