https://pantsbuild.org/ logo
#general
Title
# general
q

quaint-forest-8735

06/24/2022, 2:07 PM
Hi folks, we’re getting started with building a subtree of our monorepo w/ pants and we’re at the point where we’d like to start packaging pex binaries as part of our CI process. The `./pex package`` command is functioning in most of our environments, but is choking in our CI environment (Jenkins-on-K8s) with the following error:
Copy code
Exception: Failed to read link "/home/jenkins/agent/workspace/pipeline-name/bazel-pipeline-name": Absolute symlink: "/root/.cache/bazel/_bazel_root/948ee198afa5cf4acd9dcc1262573709/execroot/alpha"
Our CI system uses a bazel base image and the workdir appears to be an absolute symlink to
/root/.cache/bazel/...
-- could this be causing strange/unexpected behavior, and has anyone encountered this before/know about possible workarounds?
1
w

witty-crayon-22786

06/24/2022, 4:22 PM
you should be able to either
.gitignore
or
pants_ignore
that file: https://www.pantsbuild.org/docs/troubleshooting#pants-cannot-find-a-file-in-your-project
q

quaint-forest-8735

06/24/2022, 4:30 PM
Ah interesting - it looks like a dir that didn't match our gitignore pattern was being added at runtime in a previous step of the pipeline; setting the appropriate
pants_ignore
options in
pants.toml
did the trick. Thank you!
👍 1
Hmm...after fixing this, it appears that when executing on jenkins,
./pants package
has spent 30 minutes stuck on
Copy code
18:18:46.16 [INFO] Initializing scheduler...
18:18:46.30 [INFO] Scheduler initialized.
Enabling more verbose logging doesn't seem to have any effect, so I'm scratching my head here. The other odd part is that I can successfully run
./pants package
when I ssh into the build pod (runs in about ~45s)
w

witty-crayon-22786

06/24/2022, 7:23 PM
there was no additional output with
-ldebug
…?
q

quaint-forest-8735

06/24/2022, 7:23 PM
Oddly enough, no - output seemed to get truncated. I'm running it again though with
--level=trace
w

witty-crayon-22786

06/24/2022, 7:29 PM
-ldebug
should be more than enough…
trace
will be overkill.
are the
pants[d]
processes actually using CPU, or are they idle?
if they are using CPU, attaching
py-spy
or using linux
perf
would help
if they’re not using CPU, then attaching
gdb
would be next.
oh, but before that: do you see anything interesting in
.pants.d/pants.log
?
👀 1
q

quaint-forest-8735

06/24/2022, 7:33 PM
1.
pantsd
appears to be actually using cpu:
Copy code
3119 root      20   0  541.3g 113108  19364 S   0.3   0.1   0:03.28 pantsd [/home/j
2. There didn't seem to be anything of note in
.pants.d/pants.log
3. When I enabled
--level=trace
, it appears to be hanging on this step:
Copy code
19:30:29.64 [TRACE] Starting: Search for addresses in BUILD files
19:30:29.64 [TRACE] Starting: Snapshotting: BUILD, BUILD.pants
19:30:29.64 [TRACE] Starting: Fingerprinting: BUILD.pants
19:30:29.64 [TRACE] Starting: Fingerprinting: BUILD
w

witty-crayon-22786

06/24/2022, 7:34 PM
1.
pantsd
appears to be actually using cpu:
that’s showing 0.3%, which is effectively idle. so it seems like potentially a deadlock
if you can attach
gdb
and then run:
Copy code
thread apply all bt
… that would get a thread dump that could maybe point to a deadlock
q

quaint-forest-8735

06/24/2022, 7:43 PM
Here's the output from `thread apply all bt`:
w

witty-crayon-22786

06/24/2022, 9:01 PM
hm, yea: you hit a deadlock! … which Pants version is this?
2
q

quaint-forest-8735

06/24/2022, 9:01 PM
2.11.0
Would upgrading to a newer pants version (one of the 2.12.x/2.13.x beta versions) potentially help here?
w

witty-crayon-22786

06/24/2022, 9:30 PM
i don’t think so: i need to stare at it a bit longer, but i don’t think we’ve seen this one before. should be able to get a patch out today though.
q

quaint-forest-8735

06/24/2022, 9:30 PM
KK - let me know if there's anything else I can provide to help, happy to test out any patches in our environment
w

witty-crayon-22786

06/24/2022, 9:31 PM
it’s actually related to logging though… which log level were you running at? do you see the hang at
-linfo
(the default), or only at higher levels?
q

quaint-forest-8735

06/24/2022, 9:31 PM
yes I saw it at
-linfo
,
-ldebug
, and
-ltrace
w

witty-crayon-22786

06/24/2022, 10:02 PM
hm
is stderr … blocked somehow? are you wrapping in a script and not consuming it? or piping it to a limited destination of some sort?
👀 1
q

quaint-forest-8735

06/24/2022, 10:05 PM
Let me do some digging in our Jenkins config.
./pants package
was being executed in an
sh
step in jenkins: https://www.jenkins.io/doc/pipeline/steps/workflow-durable-task-step/#sh-shell-script
w

witty-crayon-22786

06/24/2022, 10:07 PM
this stack looks consistent with all of the threads waiting for one of the threads to finish flushing to stderr
so yea: would look at that.
q

quaint-forest-8735

06/24/2022, 10:07 PM
Thanks for digging into this so quickly -- will revert when I have some more info, need to check a couple of things
❤️ 1
To follow up on this, it turns out a separate issue was causing pants to start spewing a very high volume of warnings. This actually led to deadlocks in both Jenkins & GitHub Actions (I tried GitHub actions out with the identical
./pants package
command to see if this was specific to our Jenkins environment); fixing those warnings resolves the deadlock both in Jenkins as well as in GH Actions. Interestingly enough, when I run the build at the same commit that introduced the issue which led to all the warnings in a normal shell,
./pants package
succeeds. It appears that in both Jenkins & GH actions,
stderr
is either blocked/buffered in some sort of way that leads to the deadlock. Not sure if investigating the deadlock further is relevant, but thought that was an interesting data point.
3 Views