Getting a weird issue in Jenkins with black format...
# development
w
Getting a weird issue in Jenkins with black formatting when moving from 1.26.0 => 1.27.0 In 1.26.0 the command ./pants fmt2 src:: works as expected. In 1.27.0 I switched the command to ./pants fmt src:: and it works locally, however in Jenkins the output logs as if completed, and the hangs for up to a day (!) and which point Jenkins reports the following error. Any changes to the output stream or process environment that might cause pants not to 'register' the end of an operation?
w
hm, sorry for the trouble!
is it reproducible?
if so, while it’s sitting there, output from
py-spy dump
or the native backtraces of threads would be very helpful (via https://lldb.llvm.org/use/map.html#examining-thread-state “show the backtraces for all threads”)
w
just saw this, will try.
It is reproducible in the CI env
been able to reproduce locally within our build container. cannot get
py-spy
to work (Getting permission denied, even on sudo (!) however the commands been running for a lot longer that 29 seconds and it shows
Copy code
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   4504   804 pts/0    Ss   19:47   0:00 /bin/sh
root         6  2.7  3.7 1849066516 150360 pts/0 Sl+ 19:47   0:29 /root/.cache/pants/setup/bootstrap-Linux-x86_64/1.27.0_py37/bin/python /root/.cache/pants/setup/bootstrap-
root      1091  0.0  0.0   4504  1612 pts/1    Ss   19:55   0:00 /bin/sh
root      1130  0.0  0.0  34424  2840 pts/1    R+   20:05   0:00 ps aux
w
i suspect that something is deadlocked rather than busy waiting.
do you think that you could give the gdb/lldb thing a try?
w
the virtual set size is enormous
i'll have to build it on the container
w
yea, that’s expected. we use LMDB, which mmaps things aggressively
gdb should be available from your package manager…
w
ah
super strange:
Copy code
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
i am root!
groot
happening with other processes too
w
hm. that is unexpected! i don’t know whether the message it logged about ptrace is helpful… it might be? i haven’t experienced a case where gdb can’t attach
w
yeah setting that to the most permissive mode setting still causes issues
w
@wonderful-iron-54019: we have one other facility that might work. you can try sending
SIGUSR2
to the process with
kill -s SIGUSR2 $pid
it should render backtraces to the processes’ stdout… so, jenkins
w
will do, trying to see if it's only the
fmt
goal or others exhibiting this behavior at the moment
w
thank you.
w
here;s the output
Not sure if it matters but it looks that while black is running it spins up multiple processes against the same files?
w
that’s probably independent, but a new thread about that would be appreciated
ok, regarding the output from SIGUSR2… that unfortunately confirms that we would need a gdb trace to get more information…
w
ill see what i can do with that
w
also, is it an option to run with
-ldebug
in this environment? it will cause a lot more logging to be rendered.
sorry for the trouble… very interested in tracking this down.
w
yeah sure thing
yeah me too
w
@wonderful-iron-54019: and to confirm: you folks are not using
pantsd
, correct?
w
not at the moment, unless its turned on by default in 1.27
w
it is not. ok.
w
ok well this is certainly odd
looks like it starts to construct the build graph again after successful completion
w
mm… are you using both v1 and v2 ?
w
not in the format step?
but yes
it looks like this is where it hangs
still running the engine scheduler after ~4.5 min
the original step took 2.5min
w
yea… so re-constructing the build graph might be expected… hanging is most definitely not!
w
(usually much quicker too, i think running a container in my local is slowing it down)
i can keep this running and see if it ever gets passed that log
its near EOD for me so im going to go afk but i'll come back and c heck on it and report
w
i don’t expect it to finish: but thanks for reporting this. i’ll see if i can investigate this based on what you’ve reported.
have a good evening!
w
👍🏼
w
It would be worth seeing whether
--no-v1
works around this.
w
just an FYI:
--no-v1
did indeed work!
❤️ 1
this is an acceptable workaround for now, since we don't have any v2 formatting targets.
❤️ 1
thanks for you help @witty-crayon-22786
h
I thought you’re using V2 to run Black though? If that’s the case, you could split it up into a distinct V1 run vs V2 run, which is clunky but at least works around it
w
@hundreds-father-404: it sounds like that's what he did here, because they're not using any v1 formatters
👍 1
w
yeah that's correct
👍 1