Hey Pants Team, been seeing OOM errors in CI recen...
# general
Hey Pants Team, been seeing OOM errors in CI recently with warnings such as
Copy code
[2022-11-10T18:49:23Z] 18:49:23.75 [WARN] Error storing process execution result to local cache: Error storing fingerprints ["000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20"]: MDB_CURSOR_FULL: Internal error - cursor stack limit reached - ignoring and continuing
the only major change I can think of is we pull the cache in our lint step. Not sure if that somehow impacted.
yikes, sorry for the trouble. are the caches being fetched from a host that is using the same platform?
and if you purge the cache once (i.e. bump to a brand new cache base key), does the problem recur?
yeah that is correct. This runs on CI agents using the same AMI and variable is instance type
and if you purge the cache once (i.e. bump to a brand new cache base key), does the problem recur? (edited)
let me try this on the lint step.
LMDB has been ok, but we have seen this occur in some rare cases, where folks ended up with corrupted caches
that could be the case here
it’s worth noting that using a remote cache doesn’t suffer from this problem… although obviously we would like to fix it (likely by eventually replacing LMDB).
we are going through an upgrade now (from 2.7) and first time I'm seeing this issue. We maintain a single master cache. Is that recommended? For mypy we build a cache per commit and pull the latest based on base commit
We maintain a single master cache. Is that recommended?
recommended would be using remote caching, but a single local cache should be ok.
huh removing cache in the lint stage did not help
so no cache same error
hm… you’re sure that it is starting with an empty store?
in particular?
wait sorry
one sec
okay apologies - removing the cache from the lint stage seems to be helping.
i would suggest permanently changing your cache key… i.e., bumping a counter value in the key
because it should definitely be safe to use the cache this way: i suspect one-time/rare corruption.
but yea: highly recommend the remote cache. it avoids all of this manual tuning, and is faster.
👍 1
yeah remote caching sounds great
thank you for the help on this @witty-crayon-22786 I will report if anything else weird occurs
sure thing!
Sorry for the trouble on this. It’s rare for lmdb_store to get in a bad state, but not impossible…