https://pantsbuild.org/ logo
#development
Title
# development
r

red-balloon-89377

12/11/2019, 4:48 PM
Hello! I’m constantly running into this issue for the last couple of days, if anyone has any insight: When I run
./pants test tests/python/pants_test/backend/project_info/tasks:test_export_dep_as_jar
which has 10-15 unit tests, at some point I start getting
Exception: Could not initialize store for process cache: "Error making env for store at \"/private/var/folders/rm/zwytnntn5013gslf3xlv31rc0000gp/T/tmp6y9cz45d/processes/7\": No space left on device
in all the tests, which can only be solve by restarting. Until now, when I just restarted but the problem persisted. Things that I’ve checked: • I have plenty of memory left, and I’m not swapping. • The volume has enough space and inodes left. • There are no deleted files held by processes. Any ideas?
😢 1
h

hundreds-father-404

12/11/2019, 4:49 PM
@dry-analyst-73584 has been running into this problem frequently too
Could it be a space leak by the engine?
r

red-balloon-89377

12/11/2019, 4:50 PM
But I just restarted, and the disk has enough space left
🤔 1
w

witty-crayon-22786

12/11/2019, 4:50 PM
"space" here being virtual memory.
👍 1
LMDB is very virtual memory heavy.
but unless there are orphaned processes, it shouldn't be held.
r

red-balloon-89377

12/11/2019, 4:51 PM
That would be checked by a good ol’
ps aux | grep pants
, right?
w

witty-crayon-22786

12/11/2019, 4:51 PM
@red-balloon-89377: are you able to try with 998ffbb7c86ca33a2156b002a9f18eb6f1e425b8 reverted? ... might not be a clean revert, unfortunately.
👍 1
h

hundreds-father-404

12/11/2019, 4:52 PM
Also you’re using the V1 test runner and getting this? Huh. I think I’ve only gotten it using the V2 test runner (although this may be availability bias - I only ever use V2 now)
w

witty-crayon-22786

12/11/2019, 4:52 PM
my guess is that within a single run schedulers are leaking, such that we have N stores open
👍 1
r

red-balloon-89377

12/11/2019, 4:53 PM
@hundreds-father-404 This is the command line, so I think I’m using v1:
./pants test tests/python/pants_test/backend/project_info/tasks:test_export_dep_as_jar
@witty-crayon-22786 trying now…
h

hundreds-father-404

12/11/2019, 4:53 PM
+1 that that PR is likely culpable
w

witty-crayon-22786

12/11/2019, 4:56 PM
@hundreds-father-404: did that change end up being necessary...?
h

hundreds-father-404

12/11/2019, 4:56 PM
Yes, it was. Only way we could land the Pytest upgrade due to that
ZipError
issue John ran into last year
We needed the Pytest upgrade so that we can use
pytest-cov
to retry flakes
w

witty-crayon-22786

12/11/2019, 4:57 PM
oof. k.
r

red-balloon-89377

12/11/2019, 4:59 PM
Still breaks
I’m going to try to restart my computer and try then
👍 1
h

hundreds-father-404

12/11/2019, 5:00 PM
Another thing you can try is using passthrough args via
--pytest-args='-k test_foo'
r

red-balloon-89377

12/11/2019, 5:06 PM
But I want to test every test in that file (I’m doing refactorings, so I want to constantly check that I’m not breaking anything)
Okay, reverting 998ffbb7c86ca33a2156b002a9f18eb6f1e425b8. and restarting works for now, will report if things break
It works even without that commit reverted, so I’m truly keffafled now
😕 1
This has been happening again (without the revert) every 3th-4th run of the suite. Going to try with the revert
h

hundreds-father-404

12/13/2019, 3:22 PM
Revert locally or revert in CI? Reverting in CI won’t work cleanly because we would have to roll back the Pytest upgrade
r

red-balloon-89377

12/13/2019, 3:22 PM
Locally
👍 1
It’s a pain to maintain the revert in the branch, but if it allows unblocking, it’s worth it 🙂
h

hundreds-father-404

12/13/2019, 3:24 PM
I wonder what it will take to fix the issues. Stu’s intuition sounds right that we’re not cleaning up the scheduler after every test properly
r

red-balloon-89377

12/13/2019, 3:25 PM
Yeah, correlates with the observations. however, I don’t know where we would keep them around between runs, because currently I’m not using patnsd at all, so all the memory should be freed
h

hundreds-father-404

12/13/2019, 3:27 PM
An important detail is that Pytest runs tests sequentially. Once a single test finishes, it’s supposed to be cleaned up and only then does the next test start. This implies that the issue is cleanup, rather than too many tests setting up schedulers at the same time
r

red-balloon-89377

12/13/2019, 3:27 PM
But then the behaviour of “once it fails, it fails until restart” is not explained, it should either fail always or not at all
As in, it seems pretty deterministic
👍 1
h

hundreds-father-404

12/13/2019, 3:28 PM
Yes I think the bad state is being persisted across pants tests and even pants runs, until you force a clean via restart
r

red-balloon-89377

12/13/2019, 3:29 PM
Yeah, so we are back at “where do we keep that state”. I don’t think it’s lmdb, and also probably not a process in the OS. Actually, let me check that last thing
👍 1
yeap, no, we don’t keep any process named “pants” around
h

hundreds-breakfast-49010

12/13/2019, 10:54 PM
I've been noticing this as well
what I'm seeing on my linux machine is that my /tmp dir is getting filled up with pants-related files
and also the amount of space by-default allocated to /tmp is actually fairly small, 4 GB on my system
but yeah I have a bunch of
process-exeution-<random>
directories in /tmp
if I wipe them out it goes away, but this is the 2nd time today I've done that
👍 1
r

red-balloon-89377

12/16/2019, 2:07 PM
It just happened again, tried wiping /tmp, but no such luck.
h

hundreds-father-404

12/16/2019, 2:40 PM
are you using Pantsd?
d

dry-analyst-73584

12/16/2019, 5:32 PM
I ran into this again yesterday 😞
r

red-balloon-89377

12/16/2019, 5:33 PM
are you using Pantsd?
No 😞
h

happy-kitchen-89482

12/18/2019, 11:22 PM
I've just had this happen when running integration tests in the pants repo: The test run is fine, but the underlying test runs of pants get this error and so the tests fail. But then running some non-integration tests continues to be fine.
Also, running pants in another repo continues to be fine.
h

hundreds-breakfast-49010

12/27/2019, 10:16 PM
so, I'm looking into this now, and I can't seem to replicate the no space left on device error that we were all seeing ~2 weeks ago
I am seeing my
/tmp
directory fill up with tmpdirs as tests run, to a maximum of 15%/25% full depending on which tests I'm running (my /tmp file system is 4GB)
I'm also not seeing those ``process-exeution-<random>` directories show up, so maybe that specifically was the problem
and it looks like pants is cleaning up /tmp dir files even if I kill the tests with ctrl-C
it looks like some nailgun code is invoking a method that creates tmpdirs with
process-execution
as its prefix, maybe the problem was only ever nailgun-specific tests?
h

hundreds-father-404

12/27/2019, 10:41 PM
I think @dry-analyst-73584 encountered it when running Python tests that didn’t use nailgun. Do we only use nail gun for Jvm related tests?
h

hundreds-breakfast-49010

12/27/2019, 10:42 PM
that's just a guess on my part. I'm looking at the code that creates tmpdirs called
process-execution-<random>
which isn't actually just nailgun, my mistake
(specifically in
process_execution/src/lib.rs
in the function
run_and_capture_workdir
)
👍 1
h

hundreds-father-404

12/27/2019, 10:45 PM
https://github.com/pantsbuild/pants/pull/8621 is likely very relevant if you haven’t already looked at it. I think it results in setting up far more schedulers than we did before. Unfortunately, we can’t revert this PR because we need it to use modern Pytest
h

hundreds-breakfast-49010

12/27/2019, 10:48 PM
I'll take a look
but either way, I can't seem to replicate the problem on my own machine right now
and I definitely was seeing it a while ago
h

hundreds-father-404

12/27/2019, 10:49 PM
fwit, I haven’t encountered it the past 3 days
h

hundreds-breakfast-49010

12/27/2019, 10:49 PM
oh I remember this commit, yeah
this was definitely a problem ~2 weeks ago, and if neither of us have seen it in the past 3 days, maybe it got fixed by some recent commit? of course it would be good to know what commit that was, if that's the case