https://pantsbuild.org/ logo
#development
Title
# development
e

enough-analyst-54434

09/09/2021, 9:01 PM
Does anyone (still) get these? I get them extremely regularly and reproducibly when running all Pants test / under heavy machine load. Just got a light load one though - the actual test being run appears to be irrelevant:
Copy code
13:57:25.90 [ERROR] Completed: test - src/python/pants/backend/python/util_rules/pex_from_targets_test.py:tests failed (exit code -6).
============================= test session starts ==============================
collected 2 items

src/python/pants/backend/python/util_rules/pex_from_targets_test.py ..   [100%]

========================= 2 passed in 60.95s (0:01:00) =========================

FATAL: exception not rethrown
Fatal Python error: Aborted

Current thread 0x00007f7f72c33640 (most recent call first):

Thread 0x00007f7f76ab0740 (most recent call first):
  File "/home/jsirois/.pyenv/versions/3.7.11/lib/python3.7/logging/__init__.py", line 766 in _removeHandlerRef
Thats a SIGABRT by the way.
w

witty-crayon-22786

09/09/2021, 9:03 PM
yikes. no, haven’t seen those locally/remotely in a few months
does anything go to
.pants.d/pants.log
?
…i guess not in this case. but if you experience it outside of a test?
h

hundreds-father-404

09/09/2021, 9:06 PM
Yeah I've never seen that before
e

enough-analyst-54434

09/09/2021, 9:06 PM
Oh - I have auto-core-dumps via systemd going back weeks so plaenty of data. I've poked at them in the past while fixing the other one which was out fault in the ... I can't remember the exact bit of code I fixed. But wanted to check if I was the only reproer. I can sink a bunch of time on this later, but I'll file now.
When did I fix that other one - May? They were happening in May too. Exactly this one in the logging weak stuff after a test completes.
macOS seems to have a much less random scheduler. I can't remember if I customized mine or am using stock.
w

witty-crayon-22786

09/09/2021, 9:09 PM
oh: the only thing that has been potentially related recently was https://pantsbuild.slack.com/archives/C046T6T9U/p1628225810088400
we haven’t had any other repros of that though, so i had let it fall by the wayside.
e

enough-analyst-54434

09/09/2021, 9:13 PM
I'm using stock Linux CFS
Ok, I'll file a bug with a representative core dump to start just to keep track of this being a thing at least.
👍 1
Copy code
(gdb) f 56
#56 0x00007f3b28eaf617 in alloc::boxed::{{impl}}::call_once<(),FnOnce<()>,alloc::alloc::Global> () at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/alloc/src/boxed.rs:1575
1575	in /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/alloc/src/boxed.rs
(gdb) x 0x00007f3b28eaf617
0x7f3b28eaf617 <_ZN3std3sys4unix6thread6Thread3new12thread_start17h8c7c4450dba62914E+39>:	0x08738b48
So that's
std::sys::unix::thread::Thread::new(...)
.
a

average-vr-56795

09/09/2021, 10:58 PM
How about
0x7f3b276f8cb0
?
I'm not entirely sure how to read it, but it feels like the answer lies somewhere in:
Copy code
#8  panic_unwind::__rust_panic_cleanup () at library/panic_unwind/src/lib.rs:96
#9  0x00007f3b286810ad in std::panicking::try::cleanup () at library/std/src/panicking.rs:382
#10 0x00007f3b28765b92 in std::panicking::try::do_catch<std::panic::AssertUnwindSafe<closure-0>,core::task::poll::Poll<()>> (data=<optimized out>, payload=0x7f3b276f8cb0) at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:426
#11 std::panicking::try<core::task::poll::Poll<()>,std::panic::AssertUnwindSafe<closure-0>> (f=...) at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:365
#12 std::panic::catch_unwind<std::panic::AssertUnwindSafe<closure-0>,core::task::poll::Poll<()>> (f=...) at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panic.rs:434
w

witty-crayon-22786

09/09/2021, 11:01 PM
in the previous case that John investigated, there was a double panic
and removing tokio’s catching caused the actual panic to propagate
(…iirc)
e

enough-analyst-54434

09/09/2021, 11:02 PM
yes
And @average-vr-56795 that's likely the NULL:
Copy code
(gdb) f 10
#10 0x00007f3b28765b92 in std::panicking::try::do_catch<std::panic::AssertUnwindSafe<closure-0>,core::task::poll::Poll<()>> (data=<optimized out>, payload=0x7f3b276f8cb0) at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:426
426	/rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs: No such file or directory.
(gdb) p payload
$1 = (*mut u8) 0x7f3b276f8cb0
(gdb) x 0x7f3b276f8cb0
0x7f3b276f8cb0:	0x00000000
(gdb)
a

average-vr-56795

09/09/2021, 11:04 PM
Does
p data
show anything?
<optimized out>
is never a helpful thing to see 😄
e

enough-analyst-54434

09/09/2021, 11:16 PM
Yeah - no. It's optimized out of course.
As mentioned earlier, these things are a dime a dozen and I can repro all day long, so I'll circle back on that issue when I have a bit of breathing room and dive all in.
👍 1