Hey Pants team! - I'm testing pants in a docker co...
# general
r
Hey Pants team! - I'm testing pants in a docker container for portability. I'm currently investigating an issue where pants takes significantly longer in the docker container than when running on host. Here are some stats I've observed: • host without cache (
2 minutes and 5 seconds
) • container without cache (
>12 minutes
) • host with cache (
1 minute and 11 seconds
) • container with cache (
2 minute and 17 seconds
) ◦ mounted via
-v $HOME/.cache/pants:/build/.cache/pants
It feels like it's related to IO performance of the container but not 100% sure. Any ideas why running fresh in a container with no cache takes so long? Or do others have experience they can share when running pants with docker?
w
hey Nick!
👋 1
is this Linux on Linux, or Linux on macOS ?
r
Hey Stu! I'm testing locally on macOS. Also have been testing on linux via jenkins CI and experienced the same issues.
w
mm. yea, Docker running Linux atop macOS is dramatically slower in my experience, although i haven’t tracked down why.
👍 1
using a VM via vagrant or virtualbox doesn’t experience the difference
f
do you have debug logs available? (via
-ldebug
)
I’m curious to know if the Pants run is rebuilding its venv or something else that would take a while without the cache.
r
@fast-nail-55400 hmm let me try that and see what it produces
f
also I’d try mounting the
pip
cache into the container as well
👍 1
w
@fast-nail-55400: by pip cache, do you mean
~/.cache/pants/named_caches
?
f
(
$HOME/Library/Caches/pip/wheels
on macOS)
@witty-crayon-22786: No the pip cache for when
./pants
bootstraps its venv
w
hm. that should only affect startup time, right?
f
well if the container has to download every wheel to bootstrap, wouldn’t that take forever with a slow network connection?
that’s also why I’d like to see a log. it would help narrow down whether the slowness is in Pants bootstrap or whether it occurs once Pants is running
👍 2
r
now running with debug mode in container now and taking some time on this line:
Copy code
00:42:26.75 [DEBUG] Starting: local_cache_read
00:42:26.75 [DEBUG] Starting: local_cache_read
00:42:26.75 [DEBUG] Starting: (Waiting) Determine Python imports for <redacted>
I haven't seen debug logs on my host yet so not sure if this is the start of the slowness or not.
f
also try disabling the dynamic UI with
--no-dynamic-ui
, this should enable the “straggler logging” which logs tasks that take longer than 60 seconds (checking every 30 seconds).
👍 2
r
I have
dynamic_ui = false
set in my pants.toml. That's the same right?
f
yes, that is the same
r
cool and it hasn't produced any logs after that last line I sent above
f
going by the log you pasted above, the Python imports step is just blocked. The local_cache_read tasks should not be taking that long.
essentially they are just reads from the filesystem (using the LMDB library)
have you tried making an empty docker volume and mounting that as the cache? that would be “without cache” but would change which filesystem Pants was reading/writing into
r
yeah still hanging on that python import step. Here's what the container usage looks like
hmm haven't tried that yet. Let me give it a shot
f
the Python import step said “Waiting” which means it was blocked doing nothing until other tasks completed
just searching the web for “macOS docker storage slowness” turns up a bunch of blogs on the subject
seems to be a macOS/Docker issue
r
yeah I just to note I experience the same issue on our jenkins CI pipeline
so perhaps a docker issue
f
do you get similar logs from the Jenkins CI pipeline?
👀 1
also what version of Pants are you using?
r
I'm using the latest version
2.3.0
mounting an empty volume (or none at all) has no impact. Still take > 12 mins. Checking on jenkins now.
so if it isn't using a bind mount and still takes that long not sure if it's related to the IO.
f
I’m running on macOS myself. I can try replicating the result.
👍 1
r
cool let me know how that goes. Here's my simple dockerfile I'm testing with
Copy code
FROM python:3.7.10-slim

WORKDIR /build

RUN apt-get update -y && apt-get install curl build-essential git unzip -y

ADD app app
COPY pants.ci.toml pants.toml
COPY pants .

RUN ./pants --version

ENTRYPOINT ["./pants"]
noticing long runs on jenkins CI:
Copy code
01:12:46.80 [DEBUG] Running Determine Python imports for <redacted> under semaphore with concurrency id: 2

01:12:46.80 [DEBUG] Starting: Determine Python imports for <redacted>
so looks to be the same
f
are you able to share more of the log?
also, trying to replicate locally, hit some permissions problems in the container I’m trying, so Pants won’t bootstrap its venv, I have some stuff to figure out there
👍 1
r
here's more of the jenkins log (let me know if you need additional info)
Copy code
01:12:45.88 [DEBUG] Completed: Find interpreter for constraints: CPython>=3.6

01:12:45.88 [DEBUG] Completed: (Waiting) Find interpreter for constraints: CPython>=3.6

01:12:45.88 [DEBUG] Completed: local_cache_read

01:12:45.88 [DEBUG] Completed: Find Python interpreter for constraints - Selected /usr/local/bin/python3.7 to run PEXes with.

01:12:46.18 [DEBUG] Starting: local_cache_read

01:12:46.76 [DEBUG] Starting: local_cache_read

01:12:46.76 [DEBUG] Starting: local_cache_read

01:12:46.76 [DEBUG] Starting: local_cache_read

01:12:46.76 [DEBUG] Starting: local_cache_read

01:12:46.76 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.77 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.78 [DEBUG] Starting: (Waiting) Determine Python imports for <redacted>

01:12:46.78 [DEBUG] Running Determine Python imports for <redacted> under semaphore with concurrency id: 1

01:12:46.78 [DEBUG] Starting: Determine Python imports for <redacted>

01:12:46.78 [DEBUG] Starting: local_cache_read

01:12:46.79 [DEBUG] Starting: local_cache_read

01:12:46.79 [DEBUG] Starting: local_cache_read

01:12:46.79 [DEBUG] Starting: local_cache_read

01:12:46.79 [DEBUG] Starting: local_cache_read

01:12:46.79 [DEBUG] Starting: local_cache_read

01:12:46.79 [DEBUG] Starting: local_cache_read

01:12:46.79 [DEBUG] Starting: local_cache_read

01:12:46.79 [DEBUG] Starting: local_cache_read

01:12:46.79 [DEBUG] Starting: local_cache_read

01:12:46.79 [DEBUG] Starting: local_cache_read

01:12:46.79 [DEBUG] Starting: local_cache_read

01:12:46.80 [DEBUG] Starting: local_cache_read

01:12:46.80 [DEBUG] Starting: local_cache_read

01:12:46.80 [DEBUG] Starting: local_cache_read

01:12:46.80 [DEBUG] Starting: local_cache_read

01:12:46.80 [DEBUG] Starting: local_cache_read

01:12:46.80 [DEBUG] Starting: local_cache_read

01:12:46.80 [DEBUG] Starting: local_cache_read

01:12:46.80 [DEBUG] Starting: local_cache_read

01:12:46.80 [DEBUG] Starting: local_cache_read

01:12:46.80 [DEBUG] Starting: (Waiting) Determine Python imports for <redacted>

01:12:46.80 [DEBUG] Running Determine Python imports for <redacted> under semaphore with concurrency id: 2

01:12:46.80 [DEBUG] Starting: Determine Python imports for <redacted>
in this case I'm not mounting any volumes to the container
f
do you have access to the kernel logs for the Jenkins worker machine? specifically, any I/O errors?
w
just to confirm: when you say “long runs”, do you mean that it hangs there? or does it eventually make progress?
r
here some more logs that occur that may be useful
@witty-crayon-22786 it seems to just hang there. I know from previous runs it will complete the goal as expected
@fast-nail-55400 I don't believe I have access to the kernel logs atm but can escalate later if needed.
f
do your containers have CPU limits?
I’m basically just coming up with possible ideas to check, no specific theory formulated
r
I don't believe so and was thinking the same. I have allocated 8 CPUs for my local machine and I don't have any limits set.
f
the last line log suggests it is resolving python constraints
the cache reads all complete so, at least for Jenkins on Linux, slow storage is probably not the issue
r
yeah those are logs outputed before it gets stuck at
01:12:46.80 [DEBUG] Starting: Determine Python imports for <redacted>
f
try
--pex-verbosity=9
(with
-ldebug
as well)
let that run, and let’s see what it says
👍 1
that will increase the logging for the pex run doing the resolution
w
to be clear, if it seems to just hang for a while on a
Determine Python imports for
line, then it might … actually be hung, unfortunately.
that will invoke a process that usually takes on the order of 30 ms to run.
r
yeah in this case it is 🙁 been running for over 20mins now
w
i expect that
-ldebug
might be triggering a race condition of some sort in your environment. if you have time to investigate that i can help look at it tomorrow… but with regard to your benchmarking, dropping back to just
-linfo
(the default) should still get you a pretty good amount of log info.
👍 1
r
yeah that would make sense because it's the only difference from the previous runs, I'll stick to that. I definitely have time tomorrow to continue the investigation with you. Thank you for all your help today!
w
sorry for the trouble with that flag. getting test coverage for logging is … challenging
r
@fast-nail-55400 here's log output with the flags you suggested (
--pex-verbosity=9
and
-ldebug
):
now it seems to be hanging like before. Most likely will have to use
-linfo
as @witty-crayon-22786 mentioned
also testing the same command locally I see:
Copy code
...  
188.41s 	Inferring Python dependencies by analyzing imports
  188.41s 	Inferring Python dependencies by analyzing imports
  188.49s 	Inferring Python dependencies by analyzing imports
  188.51s 	Inferring Python dependencies by analyzing imports
  188.55s 	Inferring Python dependencies by analyzing imports
  188.55s 	Inferring Python dependencies by analyzing imports
  188.59s 	Inferring Python dependencies by analyzing imports
  188.66s 	Inferring Python dependencies by analyzing imports
  188.66s 	Inferring Python dependencies by analyzing imports
  188.69s 	Inferring Python dependencies by analyzing imports
  188.69s 	Inferring Python dependencies by analyzing imports
  188.74s 	Inferring Python dependencies by analyzing imports
  188.76s 	Inferring Python dependencies by analyzing imports
  188.79s 	Inferring Python dependencies by analyzing imports
  188.80s 	Inferring Python dependencies by analyzing imports
  188.83s 	Inferring Python dependencies by analyzing imports
  188.88s 	Inferring Python dependencies by analyzing imports
  188.89s 	Inferring Python dependencies by analyzing imports
  188.98s 	Inferring Python dependencies by analyzing imports
  189.03s 	Inferring Python dependencies by analyzing imports
  189.03s 	Inferring Python dependencies by analyzing imports
  189.64s 	Inferring Python dependencies by analyzing imports
  189.94s 	Inferring Python dependencies by analyzing imports
  189.97s 	Inferring Python dependencies by analyzing imports
  189.98s 	Inferring Python dependencies by analyzing imports
  190.01s 	Inferring Python dependencies by analyzing imports
  190.03s 	Inferring Python dependencies by analyzing imports
  190.03s 	Inferring Python dependencies by analyzing imports
  190.08s 	Inferring Python dependencies by analyzing imports
  190.08s 	Inferring Python dependencies by analyzing imports
  190.11s 	Inferring Python dependencies by analyzing imports
  190.11s 	Inferring Python dependencies by analyzing imports
  197.40s 	Inferring Python dependencies by analyzing imports
  203.41s 	Inferring Python dependencies by analyzing imports
  221.78s 	Inferring Python dependencies by analyzing imports
  221.86s 	Inferring Python dependencies by analyzing imports
  221.94s 	Inferring Python dependencies by analyzing imports
  224.65s 	Inferring Python dependencies by analyzing imports
  229.38s 	Inferring Python dependencies by analyzing imports
  230.39s 	Inferring Python dependencies by analyzing imports
  230.47s 	Inferring Python dependencies by analyzing imports
  231.37s 	Inferring Python dependencies by analyzing imports
  231.40s 	Inferring Python dependencies by analyzing imports
  232.53s 	Inferring Python dependencies by analyzing imports
  232.55s 	Inferring Python dependencies by analyzing imports
  232.55s 	Inferring Python dependencies by analyzing imports
  232.56s 	Inferring Python dependencies by analyzing imports
  232.57s 	Inferring Python dependencies by analyzing imports
  232.59s 	Inferring Python dependencies by analyzing imports
  232.62s 	Inferring Python dependencies by analyzing imports
  232.62s 	Inferring Python dependencies by analyzing imports
  232.70s 	Inferring Python dependencies by analyzing imports
👀 2
w
er
oh.
h
I think @acceptable-guitar-79854 has seen this on Windows WSL once?
w
what does
Copy code
./pants help-advanced global | grep -A6 'local-parallelism='
report?
r
Copy code
$ ./pants help-advanced global | grep -A6 'local-parallelism='
18:48:37.63 [INFO] initializing pantsd...
18:48:37.94 [INFO] pantsd initialized.
  --process-execution-local-parallelism=<int>
  PANTS_PROCESS_EXECUTION_LOCAL_PARALLELISM
  process_execution_local_parallelism
      default: 12
      current value: 12 (from pants.ci.toml)
      Number of concurrent processes that may be executed locally.
I have set
process_execution_local_parallelism
to the default value when testing.
f
the log has this in it:
No entry point specified, dropping into interpreter
cc @enough-analyst-54434 ^^
w
…!
e
Yeah - that's expected. If you notice in those logs the Pex command includes -c '...', just like a Python interpreter:
Copy code
$ pex -v -- -c 'print("no entrypoint being used here")'
pex: Building pex: 0.8ms                                                    
pex:   Resolving distributions ([]): 0.2ms
pex:     Resolving requirements.: 0.1ms
Running PEX file at /tmp/tmptk7v2hk4 with args ['-c', 'print("no entrypoint being used here")']
pex: PEX.run invoking /home/jsirois/.venv/pex/bin/python3 /tmp/tmptk7v2hk4 -c print("no entrypoint being used here")
pex: Re-executing: cmdline='/usr/bin/python3.9 /tmp/tmptk7v2hk4 -c print("no entrypoint being used here")', sys.executable='/home/jsirois/.venv/pex/bin/python3', PEX_PYTHON=None, PEX_PYTHON_PATH=None, interpreter_constraints=[]
pex: Found site-library: /usr/lib/python3.9/site-packages
pex: Tainted path element: /usr/lib/python3.9/site-packages
pex: Scrubbing from user site: /home/jsirois/.local/lib/python3.9/site-packages
pex: Scrubbing from site-packages: /usr/lib/python3.9/site-packages
pex: New sys.path: ['/tmp/tmptk7v2hk4/.bootstrap', '/tmp/tmptk7v2hk4', '/usr/lib/python39.zip', '/usr/lib/python3.9', '/usr/lib/python3.9/lib-dynload']
pex: Activating PEX virtual environment from /tmp/tmptk7v2hk4: 0.1ms
pex: No entry point specified, dropping into interpreter
pex: Bootstrap complete, performing final sys.path modifications...
pex: PYTHONPATH contains:
pex:     /tmp/tmptk7v2hk4
pex:   * /usr/lib/python39.zip
pex:     /usr/lib/python3.9
pex:     /usr/lib/python3.9/lib-dynload
pex:     /tmp/tmptk7v2hk4/.bootstrap
pex:   * - paths that do not exist or will be imported via zipimport
no entrypoint being used here
In particular the log is:
Copy code
pex: PEX.run invoking /usr/local/bin/python3.7 /tmp/process-executionA0Ms6e/.tmp/tmpo6joc6ld -c import hashlib, os, sys



python = os.path.realpath(sys.executable)

print(python)



hasher = hashlib.sha256()

with open(python, "rb") as fp:

  for chunk in iter(lambda: fp.read(8192), b""):

      hasher.update(chunk)

print(hasher.hexdigest())
IE: We're doing interpreter selection: https://github.com/pantsbuild/pants/blob/48f9ed14d5e5e487e007225aa4ef0b4ba985387b/src/python/pants/backend/python/util_rules/pex.py#L440-L478
There are a few other items on my debug stack and lots of folks are already paged into this and looking at it so I'll step back unless pulled in,
👍 2
w
same here. Mondays… sheesh
a
@hundreds-father-404 that's right, on a fresh installation of pants on WSL it took a v long time (8 minutes) to
Determine Python imports ...
but it did complete and then it worked just as fast as usual
👀 1
r
@acceptable-guitar-79854 interesting to see similar results.
then it worked just as fast as usual
I'm assuming this is when the cache kicked in
a
yes
👍 1
r
Okay finally have some resolution here! It turns out it came down to the OS. Changing my base image from
From python3.7
->
FROM ubuntu:latest
(
20.04
) result in wayyy faster runs. To the point where I think it's reasonable to run in CI. Here's a thread where I was debugging different Ubuntu versions https://pantsbuild.slack.com/archives/C0105PY6BM5/p1616551044003200 Thank you to everyone who helped out with this issue! Looking forward to wearing my new favorite pants in production!
👀 1
h
Thanks for the follow up Nick!
👍 1