Hello! Has anyone seen this error in their reposit...
# general
h
Hello! Has anyone seen this error in their repository?
NailgunClientException: Nailgun client error: "Nailgun client error: Client exited before the server\'s result could be returned."
We're trying really hard to track it down, and it's been difficult to reproduce.
f
A coworker was stuck on this for a while. Coworker was on MacOS Mojave, and I was unable to repro with Linux. Interestingly enough when he ran it with
--no-pantsd
he got this error instead
Abort trap: 6
It's been a few weeks though, so I don't remember the whole story.
šŸ‘€ 1
I think he got through it by running
rm -rf ~/.cache/pants
h
Huh, thanks! To confirm, you are not using remote caching, right? So far we've only seen it when using that
f
Correct, no remote caching
šŸ‘ 1
p
I just hit this error. I'm not using remote caching.
Copy code
~/p/st2sandbox/st2.git ī‚° ī‚  pants ā— ī‚° ./pants lint ::                                                                                                                                                                                                                       1
19:20:15.50 [INFO] Initialization options changed: reinitializing scheduler...
19:20:15.76 [INFO] Scheduler initialized.
ā ˆ 11.46s Building black.pex with 2 requirements: black==20.8b1, setuptools
ā ˆ 15.78s Lint using Pylint
ā ˆ 11.45s Run Flake8 on 1118 files.
ā ˆ 
ā ˆ 
ā ˆ 
ā ˆ 
ā ˆ 
Nailgun client error: "Nailgun client error: Client exited before the server\'s result could be returned."
Traceback (most recent call last):
  File "/home/cognifloyd/.cache/pants/setup/bootstrap-Linux-x86_64/2.5.0rc2_py37/lib/python3.7/site-packages/pants/bin/pants_loader.py", line 100, in run_default_entrypoint
    exit_code = runner.run(start_time)
  File "/home/cognifloyd/.cache/pants/setup/bootstrap-Linux-x86_64/2.5.0rc2_py37/lib/python3.7/site-packages/pants/bin/pants_runner.py", line 86, in run
    return remote_runner.run()
  File "/home/cognifloyd/.cache/pants/setup/bootstrap-Linux-x86_64/2.5.0rc2_py37/lib/python3.7/site-packages/pants/bin/remote_pants_runner.py", line 100, in run
    return self._connect_and_execute(pantsd_handle)
  File "/home/cognifloyd/.cache/pants/setup/bootstrap-Linux-x86_64/2.5.0rc2_py37/lib/python3.7/site-packages/pants/bin/remote_pants_runner.py", line 133, in _connect_and_execute
    command, args, modified_env
native_engine.NailgunClientException: Nailgun client error: "Nailgun client error: Client exited before the server\'s result could be returned."
šŸ‘€ 1
h
@proud-dentist-22844 can you please DM
.pants.d/pants.log
? (it's gonna be super long)
e
@faint-businessperson-86903 / @proud-dentist-22844 do you know if in either of these cases pantsd was up for a long time? That's handwavy, but one possible cause of this is Pants restarting itself due to a memory threshold being reached. Eventually, if pantsd is up long enough - that can happen and its ~fine. What's more interesting is if this is happening after a short time of pantsd being up. Here I leave short time undefined because we're still handwaving a bit.
f
I'm not sure about that, but it was happening consecutively
e
Ok - that's a really useful tidbit. Thank you.
šŸ‘ 1
h
@enough-analyst-54434 same with being consecutive in my CI run two days ago, 4 failures in a run and then suddenly it started working. I assumed it was from remote cache having issues, given that CI isn't supposed to persist state across distinct jobs and Pants CI doesn't save local cache. But who knows
e
Well, I can get this to happen every time on my machine with
pkill pantsd && ./pants --remote-cache-write test ::
Instrumentation:
Copy code
match maybe_bytes {
-                Some(bytes) => remote.store_bytes(&bytes).await,
+                Some(bytes) => {
+                  let total = bb.fetch_add(bytes.len(), std::sync::atomic::Ordering::SeqCst);
+                  eprintln!(">>> Buffering {} total bytes presently while attempting to upload {} additional bytes to remote store...", total, bytes.len());
+                  let stored = remote.store_bytes(&bytes).await;
+                  bb.fetch_sub(bytes.len(), std::sync::atomic::Ordering::SeqCst);
+                  stored
+                }
                 None => Err(format!(
                   "Failed to upload digest {:?}: Not found in local store",
Typical log:
Copy code
>>> Buffering 2357 total bytes presently while attempting to upload 64 additional bytes to remote store...
>>> Buffering 0 total bytes presently while attempting to upload 0 additional bytes to remote store...
>>> Buffering 0 total bytes presently while attempting to upload 2730 additional bytes to remote store...
>>> Buffering 2730 total bytes presently while attempting to upload 142 additional bytes to remote store...
>>> Buffering 0 total bytes presently while attempting to upload 0 additional bytes to remote store...
>>> Buffering 0 total bytes presently while attempting to upload 84 additional bytes to remote store...
>>> Buffering 0 total bytes presently while attempting to upload 0 additional bytes to remote store...
>>> Buffering 0 total bytes presently while attempting to upload 146 additional bytes to remote store...
>>> Buffering 146 total bytes presently while attempting to upload 965 additional bytes to remote store...
>>> Buffering 1111 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 1191 total bytes presently while attempting to upload 1471 additional bytes to remote store...
>>> Buffering 2662 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 2742 total bytes presently while attempting to upload 833 additional bytes to remote store...
>>> Buffering 3575 total bytes presently while attempting to upload 1419 additional bytes to remote store...
>>> Buffering 4994 total bytes presently while attempting to upload 1383 additional bytes to remote store...
>>> Buffering 6377 total bytes presently while attempting to upload 43435480 additional bytes to remote store...
>>> Buffering 43441857 total bytes presently while attempting to upload 180575760 additional bytes to remote store...
ry (above the limit of 1073741824 bytes).')
21:23:21.33 [ERROR] service failure for <pants.pantsd.service.scheduler_service.SchedulerService object at 0x7f2da2136a90>.
21:23:21.34 [INFO] Waiting for ongoing runs to complete before exiting...
012623 total bytes presently while attempting to upload 833 additional bytes to remote store...
>>> Buffering 224013456 total bytes presently while attempting to upload 998 additional bytes to remote store...
>>> Buffering 224014454 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 224014534 total bytes presently while attempting to upload 146 additional bytes to remote store...
>>> Buffering 224014680 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 224014760 total bytes presently while attempting to upload 43435480 additional bytes to remote store...
>>> Buffering 267446720 total bytes presently while attempting to upload 180575760 additional bytes to remote store...
>>> Buffering 448022480 total bytes presently while attempting to upload 0 additional bytes to remote store...
>>> Buffering 448022480 total bytes presently while attempting to upload 146 additional bytes to remote store...
>>> Buffering 448022626 total bytes presently while attempting to upload 1383 additional bytes to remote store...
>>> Buffering 448024009 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 448024089 total bytes presently while attempting to upload 833 additional bytes to remote store...
>>> Buffering 448024922 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 448025002 total bytes presently while attempting to upload 994 additional bytes to remote store...
>>> Buffering 448025996 total bytes presently while attempting to upload 43435480 additional bytes to remote store...
>>> Buffering 491461330 total bytes presently while attempting to upload 180575760 additional bytes to remote store...
>>> Buffering 672033720 total bytes presently while attempting to upload 0 additional bytes to remote store...
>>> Buffering 672033720 total bytes presently while attempting to upload 833 additional bytes to remote store...
>>> Buffering 672034553 total bytes presently while attempting to upload 146 additional bytes to remote store...
>>> Buffering 672034699 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 672034779 total bytes presently while attempting to upload 968 additional bytes to remote store...
>>> Buffering 672035747 total bytes presently while attempting to upload 1383 additional bytes to remote store...
>>> Buffering 672037130 total bytes presently while attempting to upload 0 additional bytes to remote store...
>>> Buffering 672037210 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 672037130 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 672037290 total bytes presently while attempting to upload 967 additional bytes to remote store...
>>> Buffering 672038257 total bytes presently while attempting to upload 833 additional bytes to remote store...
>>> Buffering 672039090 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 672039170 total bytes presently while attempting to upload 1383 additional bytes to remote store...
>>> Buffering 672040553 total bytes presently while attempting to upload 146 additional bytes to remote store...
>>> Buffering 672040699 total bytes presently while attempting to upload 0 additional bytes to remote store...
>>> Buffering 672040699 total bytes presently while attempting to upload 146 additional bytes to remote store...
>>> Buffering 672040845 total bytes presently while attempting to upload 833 additional bytes to remote store...
>>> Buffering 672041678 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 672041758 total bytes presently while attempting to upload 1383 additional bytes to remote store...
>>> Buffering 672043141 total bytes presently while attempting to upload 973 additional bytes to remote store...
>>> Buffering 672044114 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 672044194 total bytes presently while attempting to upload 43435480 additional bytes to remote store...
>>> Buffering 715476331 total bytes presently while attempting to upload 43435480 additional bytes to remote store...
>>> Buffering 758911811 total bytes presently while attempting to upload 42886047 additional bytes to remote store...
>>> Buffering 801797858 total bytes presently while attempting to upload 0 additional bytes to remote store...
>>> Buffering 801797858 total bytes presently while attempting to upload 994 additional bytes to remote store...
>>> Buffering 801798852 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 801798932 total bytes presently while attempting to upload 80 additional bytes to remote store...
>>> Buffering 801799012 total bytes presently while attempting to upload 146 additional bytes to remote store...
>>> Buffering 801799158 total bytes presently while attempting to upload 833 additional bytes to remote store...
>>> Buffering 801799991 total bytes presently while attempting to upload 1383 additional bytes to remote store...
>>> Buffering 801801374 total bytes presently while attempting to upload 43435480 additional bytes to remote store...
>>> Buffering 845236854 total bytes presently while attempting to upload 180575760 additional bytes to remote store...
>>> Buffering 1025810178 total bytes presently while attempting to upload 180575760 additional bytes to remote store...
>>> Buffering 1206383416 total bytes presently while attempting to upload 180575760 additional bytes to remote store...
>>> Buffering 1386959176 total bytes presently while attempting to upload 180575760 additional bytes to remote store...
šŸ‘€ 1
You can see the total bytes pending a remote write go up and down to zero but then at some point it just ramps until the memory condition and the client gets the error. Every time,
I'll formalize this a bit in the am and file a proper bug with details.
ā¤ļø 1
p
I think pantsd had been up for quite awhile.
r
These started occurring in our CI builds recently too.
h
@rapid-bird-79300 the temporary workaround is to disable pantsd in CI
āœ… 1
šŸ™ 1
Hey folks, FYI this error happens when the pantsd process is killed mid-run. This can happen for a few reasons, e.g. you literally running
kill -9 <pid>
, but often it seems it's from Linux's OOM killer. We rewrote the error message to make this much clearer: https://github.com/pantsbuild/pants/pull/12107 With remote caching, Pants was pathologically using too much memory with cache writes, so this error happened very frequently, which is now fixed. When not using remote caching, we suspect it could be from
--pantsd-max-memory-usage
being set too high (default 1GiB). You can set that to lower, but with the downside of less in-memory caching. Or, you can disable
--pantsd
entirely - which is sensible in CI but we don't recommend for desktop builds. We're looking into making the default for
--pantsd-max-memory-usage
be dynamic, and we're also profiling Pants for more memory optimizations.
e
We are also aware of unexpected aborts as well. The
Abort trap: 6
with
--no-pantsd
reported by @faint-businessperson-86903 is an instance of that. Another instance of this sort of thing is documented in https://github.com/pantsbuild/pants/issues/11926.
āž• 1
h