hi, getting a coredump from the JVM when running `...
# general
w
hi, getting a coredump from the JVM when running
./pants lint
on our whole repo (which has around 1500 Scala files):
Copy code
Error: 6.11 [ERROR] Completed: Lint with scalafmt - scalafmt failed (exit code -6).
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb1f0e766f1, pid=5768, tid=5791
#
# JRE version: OpenJDK Runtime Environment (11.0+28) (build 11+28)
# Java VM: OpenJDK 64-Bit Server VM (11+28, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xbf46f1]  Node::add_req(Node*)+0xc1
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /tmp/process-executionIMybnt/core.5768)
#
# An error report file with more information is saved as:
# /tmp/process-executionIMybnt/hs_err_pid5768.log
#
# Compiler replay data is saved as:
# /tmp/process-executionIMybnt/replay_pid5768.log
#
# If you would like to submit a bug report, please visit:
#   <http://bugreport.java.com/bugreport/crash.jsp>
#



✓ Black succeeded.
✓ Docformatter succeeded.
✓ Flake8 succeeded.
✓ Google Java Format succeeded.
✓ Shellcheck succeeded.
✓ gofmt succeeded.
✓ hadolint succeeded.
✓ isort succeeded.
𐄂 scalafmt failed.
✓ shfmt succeeded.
just checking if you are aware of this
it apparently only happens on a fresh install (i.e. CI or locally after removing all caches).
if run a second time after the first crash, it runs successfuly
h
Thanks for the report! Does this happen if you run on a smaller subset of those 1500 files? Say even just 1-2 files?
w
good call, doesn’t seem to happen on smaller sets, if I run it in a single JVM package it runs normally
f
We probably should partition the scalafmt invocations so no more than N files (for some N) are passed at a time to an single invocation.
fortunately, the scalafmt rules already partition the inputs by config file so adding partitioning to ensure a maximum size should be easy enough to add.
(since it would just involve splttting the computed partitions)
w
yeah, that may help, in fact we have one single
.scalafmt.conf
file for the whole repo, which is at the root
so maybe it is not really applying any partitioning as there is only config file…?
f
yes, it computes a single partition for that single config file
which is naive but that’s what I wrote for the first version of the scalafmt rules
we can be smarter about it though: 1. split “too large” partitions into smaller partitions 2. partition by other criteria (e.g. target)
(although we would still want to partition by config file)
w
I assuming that choosing an arbitrary number for “partition too large” may be the typical quest for not uncommon issue in Spark: how to choose a partition size 😅
1
maybe it would make sense to partition by coarsened target still respecting the partition by file originally in place
not super familiar with the code so not sure how complicated it may be or if I’m just talking nonsense
f
not too complicated, the code currently just uses a
dict
mapping config file to files to compute partitions. https://github.com/pantsbuild/pants/blob/3b1974c6879c82d512aabf1ab20dab76daab3523/src/python/pants/backend/scala/lint/scalafmt/rules.py#L237
maybe it would make sense to partition by coarsened target still respecting the partition by file originally in place
that might result in tiny partitions since scalafmt just works on a file-by-file basis. for performance, we might prefer larger partitions (but not large enough to eat all of the memory … 🙂 )
(since small partitions would result in many more partitions)
I assuming that choosing an arbitrary number for “partition too large” may be the typical quest for not uncommon issue in Spark: how to choose a partition size 😅
yes, exactly! and at that point, we’d have to take file size into account because of the potential for partition size skew 🙂
and then make a startup to use AI/ML to solve formatting partition sizes
😂 1
all joking aside, I’ll try and implement partition by config file and then by target
and then would love to get feedback from you on how it performs on your code base
w
@fast-nail-55400: fwiw: https://github.com/pantsbuild/pants/issues/13462 is inbound too
but a core dump? that’s… very, very unexpected. and i don’t know why it would happen due to file count
it seems more like something that changing the JDK would impact
f
I’d be interested in knowing if there were any errors in the kernel log when the SIGSEGV occurred.
can SIGSEGV be a sign of memory pressure / OOM kills?
w
i don’t think so, no. that would be KILL or TERM
and if the JVM had memory pressure, it would OOM rather than segfault
f
so another answer may be to upgrade the JVM
w
yeah, it would feel very odd to me that it segfaulted due to an OOM
sorry I’ve got a bit busy with other stuff but I could try to get the dumps generated during the JVM crash if you think that would be useful
oh wow, interesting, wiped the Pants caches, added
jdk = "adopt:1.12"
to
pants.toml
and run
./pants lint ::
. It runs properly, with no crashes
h
The JVM should never segfault, so clearly some bug there
Oh, that's good. Is that an acceptable workaround for now?
w
also, fwiw: you should never need to wipe caches, even/especially when changing the
jdk
. all such information is ~guaranteed to be in cache keys.
👍 1
w
as of now, using 1.12 seems an acceptable workaround
the JVM bug in 1.11 is odd though, and wondering if that is a left over in an JVM old version that now is not getting updates as the
adopt
distribution has been moved to be Eclipse Temurin
running scalafmt CLI in the whole repo with Temurin 1.11 doesn’t segfault either
may be worth changing the default JVM in Pants to be
temurin:1.11
instead of
adopt:1.11
?
👀 1
h
may be worth changing the default JVM in Pants to be temurin:1.11 instead of adopt:1.11?
I don't know JVM well enough to say. Bump @witty-crayon-22786 @fast-nail-55400?
w
so, honestly: i think that our expectation was that almost everyone would set this to a non-default value… i didn’t in the example repo, because we’re also trying to make it clear what is required vs optional
the JVM used should generally match what you are using in production to deploy your apps
h
We probably want to update the docs to mention how to set it: https://www.pantsbuild.org/docs/jvm-overview
👍 1
w
would be interesting to do a twitter poll or something though.
👍 1
@hundreds-father-404: it occurs to me that i don’t know what the default interpreter constraints are for python, but: it’s related. we have a default, but you almost certainly want to set it.
h
>=3.6,<4
, which is a bad default now that 3.6 is EOL
w
I agree the JDK setting should be explicitly set in the config file
I guess it isn’t easy to find the right balance between easy to setup with sensible defaults and being explicit about configuration.
2