hi getting a coredump from the JVM when running ` pants lint Pants #general

hi, getting a coredump from the JVM when running `...

witty-family-13337

01/12/2022, 1:13 PM

hi, getting a coredump from the JVM when running

./pants lint

on our whole repo (which has around 1500 Scala files):

Copy code

Error: 6.11 [ERROR] Completed: Lint with scalafmt - scalafmt failed (exit code -6).
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb1f0e766f1, pid=5768, tid=5791
#
# JRE version: OpenJDK Runtime Environment (11.0+28) (build 11+28)
# Java VM: OpenJDK 64-Bit Server VM (11+28, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xbf46f1]  Node::add_req(Node*)+0xc1
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /tmp/process-executionIMybnt/core.5768)
#
# An error report file with more information is saved as:
# /tmp/process-executionIMybnt/hs_err_pid5768.log
#
# Compiler replay data is saved as:
# /tmp/process-executionIMybnt/replay_pid5768.log
#
# If you would like to submit a bug report, please visit:
#   <http://bugreport.java.com/bugreport/crash.jsp>
#



✓ Black succeeded.
✓ Docformatter succeeded.
✓ Flake8 succeeded.
✓ Google Java Format succeeded.
✓ Shellcheck succeeded.
✓ gofmt succeeded.
✓ hadolint succeeded.
✓ isort succeeded.
𐄂 scalafmt failed.
✓ shfmt succeeded.

just checking if you are aware of this

witty-family-13337

01/12/2022, 1:17 PM

it apparently only happens on a fresh install (i.e. CI or locally after removing all caches).

witty-family-13337

01/12/2022, 1:17 PM

if run a second time after the first crash, it runs successfuly

happy-kitchen-89482

01/12/2022, 2:17 PM

Thanks for the report! Does this happen if you run on a smaller subset of those 1500 files? Say even just 1-2 files?

witty-family-13337

01/12/2022, 2:32 PM

good call, doesn’t seem to happen on smaller sets, if I run it in a single JVM package it runs normally

fast-nail-55400

01/12/2022, 4:08 PM

We probably should partition the scalafmt invocations so no more than N files (for some N) are passed at a time to an single invocation.

fast-nail-55400

01/12/2022, 4:09 PM

fortunately, the scalafmt rules already partition the inputs by config file so adding partitioning to ensure a maximum size should be easy enough to add.

fast-nail-55400

01/12/2022, 4:10 PM

(since it would just involve splttting the computed partitions)

witty-family-13337

01/12/2022, 4:16 PM

yeah, that may help, in fact we have one single

.scalafmt.conf

file for the whole repo, which is at the root

witty-family-13337

01/12/2022, 4:17 PM

so maybe it is not really applying any partitioning as there is only config file…?

fast-nail-55400

01/12/2022, 4:18 PM

yes, it computes a single partition for that single config file

fast-nail-55400

01/12/2022, 4:18 PM

which is naive but that’s what I wrote for the first version of the scalafmt rules

fast-nail-55400

01/12/2022, 4:19 PM

we can be smarter about it though: 1. split “too large” partitions into smaller partitions 2. partition by other criteria (e.g. target)

fast-nail-55400

01/12/2022, 4:20 PM

(although we would still want to partition by config file)

witty-family-13337

01/12/2022, 4:25 PM

I assuming that choosing an arbitrary number for “partition too large” may be the typical quest for not uncommon issue in Spark: how to choose a partition size 😅

➕ 1

witty-family-13337

01/12/2022, 4:26 PM

maybe it would make sense to partition by coarsened target still respecting the partition by file originally in place

witty-family-13337

01/12/2022, 4:26 PM

not super familiar with the code so not sure how complicated it may be or if I’m just talking nonsense

fast-nail-55400

01/12/2022, 4:28 PM

not too complicated, the code currently just uses a

dict

mapping config file to files to compute partitions. https://github.com/pantsbuild/pants/blob/3b1974c6879c82d512aabf1ab20dab76daab3523/src/python/pants/backend/scala/lint/scalafmt/rules.py#L237

fast-nail-55400

01/12/2022, 4:31 PM

maybe it would make sense to partition by coarsened target still respecting the partition by file originally in place

that might result in tiny partitions since scalafmt just works on a file-by-file basis. for performance, we might prefer larger partitions (but not large enough to eat all of the memory … 🙂 )

fast-nail-55400

01/12/2022, 4:33 PM

(since small partitions would result in many more partitions)

fast-nail-55400

01/12/2022, 4:34 PM

I assuming that choosing an arbitrary number for “partition too large” may be the typical quest for not uncommon issue in Spark: how to choose a partition size 😅

yes, exactly! and at that point, we’d have to take file size into account because of the potential for partition size skew 🙂

fast-nail-55400

01/12/2022, 4:34 PM

and then make a startup to use AI/ML to solve formatting partition sizes

😂 1

fast-nail-55400

01/12/2022, 4:45 PM

all joking aside, I’ll try and implement partition by config file and then by target

fast-nail-55400

01/12/2022, 4:46 PM

and then would love to get feedback from you on how it performs on your code base

witty-crayon-22786

01/12/2022, 4:58 PM

@fast-nail-55400: fwiw: https://github.com/pantsbuild/pants/issues/13462 is inbound too

witty-crayon-22786

01/12/2022, 4:58 PM

but a core dump? that’s… very, very unexpected. and i don’t know why it would happen due to file count

witty-crayon-22786

01/12/2022, 4:59 PM

it seems more like something that changing the JDK would impact

fast-nail-55400

01/12/2022, 5:08 PM

I’d be interested in knowing if there were any errors in the kernel log when the SIGSEGV occurred.

fast-nail-55400

01/12/2022, 5:08 PM

can SIGSEGV be a sign of memory pressure / OOM kills?

witty-crayon-22786

01/12/2022, 5:10 PM

i don’t think so, no. that would be KILL or TERM

witty-crayon-22786

01/12/2022, 5:10 PM

and if the JVM had memory pressure, it would OOM rather than segfault

fast-nail-55400

01/12/2022, 5:13 PM

jvm bug: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8208275

fast-nail-55400

01/12/2022, 5:16 PM

so another answer may be to upgrade the JVM

witty-crayon-22786

01/12/2022, 5:17 PM

https://www.pantsbuild.org/v2.9/docs/reference-jvm#section-jdk

witty-family-13337

01/12/2022, 5:23 PM

yeah, it would feel very odd to me that it segfaulted due to an OOM

witty-family-13337

01/12/2022, 5:24 PM

sorry I’ve got a bit busy with other stuff but I could try to get the dumps generated during the JVM crash if you think that would be useful

witty-family-13337

01/12/2022, 5:33 PM

oh wow, interesting, wiped the Pants caches, added

jdk = "adopt:1.12"

pants.toml

and run

./pants lint ::

. It runs properly, with no crashes

happy-kitchen-89482

01/12/2022, 11:34 PM

The JVM should never segfault, so clearly some bug there

happy-kitchen-89482

01/12/2022, 11:34 PM

Oh, that's good. Is that an acceptable workaround for now?

witty-crayon-22786

01/12/2022, 11:35 PM

also, fwiw: you should never need to wipe caches, even/especially when changing the

jdk

. all such information is ~guaranteed to be in cache keys.

👍 1

witty-family-13337

01/13/2022, 8:23 AM

as of now, using 1.12 seems an acceptable workaround

witty-family-13337

01/13/2022, 8:24 AM

the JVM bug in 1.11 is odd though, and wondering if that is a left over in an JVM old version that now is not getting updates as the

adopt

distribution has been moved to be Eclipse Temurin

witty-family-13337

01/13/2022, 8:24 AM

running scalafmt CLI in the whole repo with Temurin 1.11 doesn’t segfault either

witty-family-13337

01/13/2022, 8:28 AM

may be worth changing the default JVM in Pants to be

temurin:1.11

instead of

adopt:1.11

👀 1

hundreds-father-404

01/13/2022, 8:08 PM

may be worth changing the default JVM in Pants to be temurin:1.11 instead of adopt:1.11?

I don't know JVM well enough to say. Bump @witty-crayon-22786 @fast-nail-55400?

witty-crayon-22786

01/13/2022, 8:16 PM

so, honestly: i think that our expectation was that almost everyone would set this to a non-default value… i didn’t in the example repo, because we’re also trying to make it clear what is required vs optional

witty-crayon-22786

01/13/2022, 8:16 PM

the JVM used should generally match what you are using in production to deploy your apps

hundreds-father-404

01/13/2022, 8:17 PM

We probably want to update the docs to mention how to set it: https://www.pantsbuild.org/docs/jvm-overview

👍 1

witty-crayon-22786

01/13/2022, 8:17 PM

would be interesting to do a twitter poll or something though.

👍 1

witty-crayon-22786

01/13/2022, 8:20 PM

@hundreds-father-404: it occurs to me that i don’t know what the default interpreter constraints are for python, but: it’s related. we have a default, but you almost certainly want to set it.

hundreds-father-404

01/13/2022, 8:23 PM

>=3.6,<4

, which is a bad default now that 3.6 is EOL

witty-family-13337

01/14/2022, 9:32 AM

I agree the JDK setting should be explicitly set in the config file

witty-family-13337

01/14/2022, 9:32 AM

I guess it isn’t easy to find the right balance between easy to setup with sensible defaults and being explicit about configuration.

➕ 2

4 Views

Open in Slack

Previous Next