When running: ```pants test --use-coverage --cover...
# general
p
When running:
Copy code
pants test --use-coverage --coverage-py-report="['xml','json','html','console','raw']" --test-report --stats-log //::
I get the following error:
Copy code
/github/home/.cargo/git/checkouts/lmdb-rs-369bfd26153a2575/6ae7a55/lmdb-sys/lmdb/libraries/liblmdb/m:2126: Assertion 'rc == 0' failed in mdb_page_dirty()
This happened with both pants 2.22.1 and 2.23.0. I deleted
lmdb_store/
and tried again and it worked just fine. Quick Google search yielded https://github.com/bmatsuo/lmdb-go/issues/131 Is there any solution that doesn't require deleting
lmdb_store
?
c
sounds like https://github.com/pantsbuild/pants/issues/18726 . I'm not familiar with the internals of lmdb, but I'd imagine that a corrupt entry somehow makes it into the DB and then retrievals fail. We don't really have a good external interface for viewing/managing the LMDB store. Purging it remains mostly the only option.
If you have a reliable way of putting it into that state, we'd be interested. We might be doing something bad.
p
Unfortunately, I don’t have a way to reproduce this issue yet. However, like in the ticket referenced above, we have been running with pantsd disabled, because this was running in CI.
I will add that we have concurrent pants invocations without pantsd on a single server, up to 8. So, it could be some lock failing somewhere.
The local lmdb store is in addition to a remote store. We would like to keep the local store because, well, torch + cuda :-/
c
I wonder if it would be worth isolating the concurrent executions local store from each other's, using the local_store_dir option. Downside is more disk usage and fewer hits cross-execution
p
That could work in my case because we use a remote store in addition to the local store. However, what would be the "key" to separate each concurrent job's store? I saw
CI_CONCURRENT_ID
(https://docs.gitlab.com/ee/ci/variables/predefined_variables.html), but that wouldn't work because it never repeats.
c
It's been a while since I've used Gitlab. I'm more used to Github Actions or Azure Devops where each runner can only execute 1 task, requiring multiple runners. That case is easier, it's just specifying a number for each runner. I'm guessing though that you're using a concurrency to allow a single Gitlab runner (per server) to run multiple jobs simultaneously. Is
CI_RUNNER_ID
unique per "execution slot"? You could also write a quick bash script that tries to acquire a lock on files using
flock