Escalating this: We appear to have a major problem...
# development
h
Escalating this: We appear to have a major problem building wheels in CI
1
That is a dummy commit with no changes, on top of recent main
This may only happen in PR builds, the main branch build without thew dummy commit works: https://github.com/pantsbuild/pants/actions/runs/2586928473
It happens on Linux x86 and MacOS x86, but not on MacOS ARM64
The error on linux x86 is a segfault:
Copy code
CalledProcessError(-11, ('./pants', '--no-dynamic-ui', '--concurrent', 'package', 'src/python/pants:pants-packaged', 'src/python/pants/testutil:testutil_wheel'))
The error on macOS x86 is whatever exit code -4 designates:
Copy code
CalledProcessError(-4, ('./pants', '--no-dynamic-ui', '--concurrent', 'package', 'src/python/pants:pants-packaged', 'src/python/pants/testutil:testutil_wheel'))
I'm fairly stumped
f
is it OOM situation?
h
i'm stumped trying to reason about what has changed in the last 24 hours. Maybe we should try rebuilding jobs that were succeeding 24 hours ago? And why did some of the PRs today not fail?
c
One difference I see is that if it's a PR then we build the rust in debug mode
[[ "${GITHUB_EVENT_NAME}" == "pull_request" ]] && export MODE=debug
h
ohhh that's interesting!! that it's only PR builds failing. Great suggestoin!
and that would explain why CI worked for some PRs, they used
[ci skip-build-wheels]
c
Also I think it's unlikely that it's an OOM. We're getting CalledProcessError -11 indicating SIGSEV. The OOM reaper usually sends signal 9 SIGKILL and I don't think it's possible to trap it
also the docker image we use was last updated 2022-26T11:00, 4 days ago
1
so good news (?), I get a repro in the docker container by just walking through the script. failure is on the step
./build-support/bin/release.sh build-local-pex
h
wow awesome! thank you so much for digging into this
only when using MODE=debug?
c
still looking into it, I'm trying next with version n-1 of the docker container. I'm not really a rustacean but I do know about people changing dependencies and the build breaking
ok, well, good news? I also get a repro on the
<http://quay.io/pypa/manylinux2014_x86_64:2022-06-20-d72b943|quay.io/pypa/manylinux2014_x86_64:2022-06-20-d72b943>
image. it has the same filesize as the current one, so that makes some sense. with the optimised build it too runs. trying with n-2, which has a different filesize
wild, so,
<http://quay.io/pypa/manylinux2014_x86_64:2022-06-13-c365205|quay.io/pypa/manylinux2014_x86_64:2022-06-13-c365205>
also has the problem where unoptimised builds throw a sigsev. So, I think this might be something in our code itself? I've got to pack it in for the night. repro with docker
docker run --rm --name pants -it <http://quay.io/pypa/manylinux2014_x86_64:latest|quay.io/pypa/manylinux2014_x86_64:latest> bash
and then
Copy code
git clone <https://github.com/pantsbuild/pants.git>
git config --global safe.directory /pants
pushd /pants
curl --proto ''=https'' --tlsv1.2 -sSf <https://sh.rustup.rs> | sh -s -- -v -y --default-toolchain none
export PATH=${PATH}:${HOME}/.cargo/bin
export PATH=${PATH}:/opt/python/cp37-cp37m/bin:/opt/python/cp38-cp38/bin:/opt/python/cp39-cp39/bin
export MODE=debug

./build-support/bin/release.sh build-local-pex
My next step would be to git-bisect, but I probably won't have time before friday.
c
Not sure if this is related or not, but I’ve just started seeing issues with running
./pants check
on more than a few files. Not done a full bisect yet, but
2.12.x
is OK, while
2.13.x
exhibits this issue. The end of a trace run without pantsd looks like:
Copy code
11:11:25.38 [DEBUG] Running Run MyPy on 13 files. under semaphore with concurrency id: 8, and concurrency: 1
11:11:25.38 [TRACE] The request is not nailgunnable! Short-circuiting to regular process execution
11:11:25.38 [DEBUG] Starting: Run MyPy on 13 files.
11:11:25.38 [DEBUG] Starting: setup_sandbox
Illegal instruction: 4
Command was:
Copy code
$ ./pants --no-pantsd --no-dynamic-ui -ltrace check src/python/pants/backend/docker/util_rules:
This will be my first
git bisect
session, ever! TIL 😄 what a cool git feature
a
Looks like the original one is maybe blowing out the stack in debug mode?
Copy code
#0  0x00007f0dd4c47c02 in __rust_probestack () from /Users/dwh/src/github.com/pantsbuild/pants/src/python/pants/engine/internals/native_engine.so
#1  0x00007f0dd3ed1442 in _$LT$alloc..vec..into_iter..IntoIter$LT$T$C$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$::next::hb3306f8227d7214a (self=0x0)
    at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/into_iter.rs:143
#2  0x00007f0dd3ed50fe in core::iter::traits::iterator::Iterator::try_fold::h59fdcac3e0f60533 (self=0x7f0dcfe30e28, init=..., f=...) at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:2185
#3  0x00007f0dd3f98ab6 in _$LT$core..iter..adapters..map..Map$LT$I$C$F$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$::try_fold::h4f928108bf44f512 (self=0x7f0dcfe30e28, init=..., g=...)
    at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/adapters/map.rs:117
#4  0x00007f0dd3f9ebe2 in _$LT$I$u20$as$u20$alloc..vec..in_place_collect..SpecInPlaceCollect$LT$T$C$I$GT$$GT$::collect_in_place::hfdebe04bb5589b35 (self=0x7f0dcfe30e28, dst_buf=0x7f0db1516300, end=0x7f0db151be80)
    at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/in_place_collect.rs:251
#5  0x00007f0dd3edf2c9 in alloc::vec::in_place_collect::_$LT$impl$u20$alloc..vec..spec_from_iter..SpecFromIter$LT$T$C$I$GT$$u20$for$u20$alloc..vec..Vec$LT$T$GT$$GT$::from_iter::hd20db111ec09bed6 (iterator=...)
    at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/in_place_collect.rs:178
#6  0x00007f0dd3ee3141 in _$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$core..iter..traits..collect..FromIterator$LT$T$GT$$GT$::from_iter::hfe20499127823b81 (iter=...)
    at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/mod.rs:2554
#7  0x00007f0dd3f99b39 in core::iter::traits::iterator::Iterator::collect::h9ee1894b18fe5bd3 (self=...) at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:1784
#8  0x00007f0dd3fa7f87 in _$LT$alloc..boxed..Box$LT$$u5b$I$u5d$$GT$$u20$as$u20$core..iter..traits..collect..FromIterator$LT$I$GT$$GT$::from_iter::h271b6de0f310c330 (iter=...)
    at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/boxed.rs:1889
#9  0x00007f0dd3f99a32 in core::iter::traits::iterator::Iterator::collect::h53ab4d6af960ad4a (self=...) at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:1784
#10 0x00007f0dd3f0fc9f in futures_util::future::try_join_all::try_join_all::he289352d4da757d1 (i=...) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.21/src/future/try_join_all.rs:91
#11 0x00007f0dd4006362 in store::Store::materialize_directory_helper::_$u7b$$u7b$closure$u7d$$u7d$::hd6bdc5abddf1b396 () at fs/store/src/lib.rs:1183
#12 0x00007f0dd3efbf3a in _$LT$core..future..from_generator..GenFuture$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h048fdbb04612eed1 (self=..., cx=0x7f0dd00025c8)
    at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91
#13 0x00007f0dd3fbe8af in _$LT$core..pin..Pin$LT$P$GT$$u20$as$u20$core..future..future..Future$GT$::poll::hc671042fa21ab1eb (self=..., cx=0x7f0dd00025c8)
    at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/future.rs:124
#14 0x00007f0dd400761f in store::Store::materialize_directory_helper::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h86eb3fdb07529c3f () at fs/store/src/lib.rs:1177
c
h
Oh wow, all this great debugging happened while I was asleep
🪄 2
😴 2
Thanks for digging in everyone!
f
I cherry-picked this PR to
2.13.x
as well. Probably need to box that future now (for
load_bytes_with
).
But revert first and ask question later then.
Also https://github.com/pantsbuild/pants/pull/15996 will need to be reverted as well as it stacks on top.
h
@fast-nail-55400 can you get out a clean revert PR?
With both?
f
yes
h
Thanks!
f
revert landed on
main
1
revert landed earlier on
2.13.x
as well
h
Thanks!!