Hey Pants Team been seeing OOM errors in CI recently with wa Pants #general

Hey Pants Team, been seeing OOM errors in CI recen...

rapid-bird-79300

11/10/2022, 6:53 PM

Hey Pants Team, been seeing OOM errors in CI recently with warnings such as

Copy code

[2022-11-10T18:49:23Z] 18:49:23.75 [WARN] Error storing process execution result to local cache: Error storing fingerprints ["000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20", "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20"]: MDB_CURSOR_FULL: Internal error - cursor stack limit reached - ignoring and continuing

the only major change I can think of is we pull the cache in our lint step. Not sure if that somehow impacted.

witty-crayon-22786

11/10/2022, 6:54 PM

yikes, sorry for the trouble. are the caches being fetched from a host that is using the same platform?

witty-crayon-22786

11/10/2022, 6:55 PM

and if you purge the cache once (i.e. bump to a brand new cache base key), does the problem recur?

rapid-bird-79300

11/10/2022, 6:55 PM

yeah that is correct. This runs on CI agents using the same AMI and variable is instance type

rapid-bird-79300

11/10/2022, 6:56 PM

and if you purge the cache once (i.e. bump to a brand new cache base key), does the problem recur? (edited)

let me try this on the lint step.

witty-crayon-22786

11/10/2022, 6:56 PM

LMDB has been ok, but we have seen this occur in some rare cases, where folks ended up with corrupted caches

rapid-bird-79300

11/10/2022, 6:57 PM

that could be the case here

witty-crayon-22786

11/10/2022, 6:57 PM

it’s worth noting that using a remote cache doesn’t suffer from this problem… although obviously we would like to fix it (likely by eventually replacing LMDB).

rapid-bird-79300

11/10/2022, 7:00 PM

we are going through an upgrade now (from 2.7) and first time I'm seeing this issue. We maintain a single master cache. Is that recommended? For mypy we build a cache per commit and pull the latest based on base commit

witty-crayon-22786

11/10/2022, 7:02 PM

We maintain a single master cache. Is that recommended?

recommended would be using remote caching, but a single local cache should be ok.

rapid-bird-79300

11/10/2022, 7:03 PM

huh removing cache in the lint stage did not help

rapid-bird-79300

11/10/2022, 7:03 PM

so no cache same error

witty-crayon-22786

11/10/2022, 7:04 PM

hm… you’re sure that it is starting with an empty store?

~/.cache/pants/lmdb_store

in particular?

rapid-bird-79300

11/10/2022, 7:04 PM

wait sorry

rapid-bird-79300

11/10/2022, 7:05 PM

one sec

rapid-bird-79300

11/10/2022, 7:12 PM

okay apologies - removing the cache from the lint stage seems to be helping.

witty-crayon-22786

11/10/2022, 7:15 PM

i would suggest permanently changing your cache key… i.e., bumping a counter value in the key

witty-crayon-22786

11/10/2022, 7:15 PM

because it should definitely be safe to use the cache this way: i suspect one-time/rare corruption.

witty-crayon-22786

11/10/2022, 7:15 PM

but yea: highly recommend the remote cache. it avoids all of this manual tuning, and is faster.

👍 1

rapid-bird-79300

11/10/2022, 8:11 PM

yeah remote caching sounds great

rapid-bird-79300

11/10/2022, 8:11 PM

thank you for the help on this @witty-crayon-22786 I will report if anything else weird occurs

witty-crayon-22786

11/10/2022, 9:19 PM

sure thing!

happy-kitchen-89482

11/10/2022, 10:44 PM

Sorry for the trouble on this. It’s rare for lmdb_store to get in a bad state, but not impossible…

7 Views

Open in Slack

Previous Next