We're seeing very slow docker builds with pants, a...
# general
w
We're seeing very slow docker builds with pants, and was wondering if we're doing something obviously bad:
Copy code
19:41:23.94 [INFO] Long running tasks:
  643.75s	Building docker image <http://amramedical.com/client-session-tagger:latest|amramedical.com/client-session-tagger:latest> +2 additional tags.
These are small services, with 4 requirements (largest being boto3). The time spent is mostly focused on building "requirements" pexes, meaning
pex_binary
targets in the container with only requirements included. We use a lockfile, with_tools=True and layout="packed". We use only wheels.
Copy code
#13 [source 4/4] RUN PEX_TOOLS=1 /usr/local/bin/python client-session-tagger-handler.pex venv --compile /app
#13 sha256:3dcefd9a9645feb44367da0c45b3370685d8cc30427c1a07e8d99fe9111ab559
#13 DONE 3.7s
#9 [dependencies 4/4] RUN PEX_TOOLS=1 /usr/local/bin/python client-session-tagger-requirements.pex venv     --scope=deps     --compile /app
#9 sha256:ef99e535e560ee6d3decd7805cb3c81c0d17b05123d0f2b0505b157e6efb1697
#9 DONE 637.8s
h
are you building the pex in the container or just copying it in?
w
copying, I think?
e
Copy code
#9 [dependencies 4/4] RUN PEX_TOOLS=1 /usr/local/bin/python client-session-tagger-requirements.pex venv     --scope=deps     --compile /app
#9 sha256:ef99e535e560ee6d3decd7805cb3c81c0d17b05123d0f2b0505b157e6efb1697
#9 DONE 637.8s
That is ridiculously slow for that step. To be justified the client-session-tagger-requirements.pex would need to be ~TB big or the docker environment handicapped with too low crgoup limits or a strange / slow FS driver.
Can you run that same step by hand on a "normal" linux box and compare times?
w
its a GHA runner
e
What parallelism do you have Pants configured to use out in GHA land? Another not super likely possibility is its set or defaulting too high and you're getting thrash; i.e. 16 cores when you really have 1.
w
It's the default
e
Ok, if you can run that step by hand (no docker involved) on some other Linux box and provide the timing I'd e grateful.
Basically:
Copy code
PEX_TOOLS=1 time /usr/local/bin/python client-session-tagger-requirements.pex venv     --scope=deps --compile /app
👍 1
~600s is 2 orders of magnitude higher than expected for even very large (100s of MB) PEX files.
w
hmm
Copy code
./pants package cloud/deployment/client_session_tagger/handler.py
22:00:26.58 [INFO] Completed: Building 4 requirements for cloud.deployment.client_session_tagger/client-session-tagger-requirements.pex from the pants-lockfile.txt resolve: awslambdaric==2.0.1, boto3-stubs[batch,codeartifact,codeb..
. (147 characters truncated)
22:00:26.58 [ERROR] 1 Exception encountered:

  Exception: Error opening file /home/tobias/projects/profiler/dist/cloud.deployment.client_session_tagger/client-session-tagger-handler.pex/.bootstrap for writing: Os { code: 20, kind: NotADirectory, message: "Not a directory" }
Sorry, sidetrack, that's not the target (although that error is strange?)
e
That error is a bit strange. What are the options on the
pex_binary
target owning
cloud/deployment/client_session_tagger/handler.py
? It looks like you must be using
layout="packed"
or
layout="loose"
?
w
layout="packed", granted I haven't done any indepth analysis, just followed the blog post on pex + docker w.r.t pex_binary configuration
I can def say I can reproduce outside of docker btw, it's still running 3 minutes later!
e
In a packed layout `/.bootstrap`is a zip file and not a directory. So this feels like you exported once with layout packed and then again with layout loose (where `/.bootstrap/`is a directory and not a zip).
w
python  venv --scope=deps --compile ~/test-app  230.44s user 0.40s system 99% cpu 3:50.89 total
e
OK. A few more questions: + Why packed - just the blog post or were you forced to do this because your dependencies are too large and a normal PEX is just too big? + How big is the packed PEX / can you run a
du -sh dist/cloud.deployment.client_session_tagger/client-session-tagger-handler.pex/
?
Actually - The Spanish Inquisition! - question 3: + Can you re-run with
PEX_VERBOSE=9
in the environment and provide the full output? That will provide sub-step timings.
👍 1
w
Just the blog post.
Copy code
du -sh dist/cloud.deployment.client_session_tagger/client-session-tagger-requirements.pex/
12M     dist/cloud.deployment.client_session_tagger/client-session-tagger-requirements.pex/
e
Can you also try a timing without the
layout="packed"
? That will be a good comparison.
w
With layout packed, verbosity=9
layout-packed-verbose-9.txt
omitting layout doesn't seem to have an effect
e
Ok. One last thing and I think I can locally repro and dig - can you provide the
PEX-INFO
file from the root of that packed PEX directory? That should give me enough to repro.
The mypy-boto3 ~500 optional dependcies things seems like it must be triggering some yet un-noticed quadratic behavior analyzing deps.
w
Copy code
{"bootstrap_hash": "7ed7efd7da2fe43a9694a92b6cc32d60dfaba9c0", "build_properties": {"pex_version": "2.1.102"}, "code_hash": "0e2152a488f400afe95682d5bf5c183d2ac7d092", "distributions": {"awslambdaric-2.0.1-cp310-cp310-manylinux_2_17
_x86_64.manylinux2014_x86_64.whl": "c050437bbbf68fc7f3c1062e0dae4cb42e2d509e1e1266f4a4ae9e2e5081e348", "boto3-1.24.51-py3-none-any.whl": "df2977f2c1a74c1290a1b88847e017f6985e43eee065b4efaa37db7e116f1998", "boto3_stubs-1.24.78-py3-no
ne-any.whl": "43b32c328d8a8a0c14cd6492d5b724fc7a43f931c0b83947381f05000915db0d", "botocore-1.27.51-py3-none-any.whl": "e0d72c329fdfe2a26f86449b19232171f3d7cef1edc78b261c76f906708a107b", "botocore_stubs-1.24.10.post1-py3-none-any.whl
": "77020820fbb2c44343532b6fa1f8b69302a6ce5fa68e3b32b5f93a6301c5b626", "injector-0.20.1-py2.py3-none-any.whl": "08cb9e4669f29b6cbcae9bc287f3be1277471427fbcdfe18001d9e46241be42a", "jmespath-0.10.0-py2.py3-none-any.whl": "3243b80351ae
e183daf14f678676f1296ef5c6817aa397bf829fa3d4c9abe6a8", "mypy_boto3_batch-1.24.36.post1-py3-none-any.whl": "2dabc6ab8581f8beff18db77dc67de1decca97c6143b061929525df7f20ec0cd", "mypy_boto3_codeartifact-1.24.36.post1-py3-none-any.whl":
"3e38e80728bfef7699010cd50d495336c8b43dbfa095de26d95431a582798c0a", "mypy_boto3_codebuild-1.24.36.post1-py3-none-any.whl": "2427bc3f7d64328e77795969113f72edf99529d5f24fffde9bbaa9eadcfb84a1", "mypy_boto3_codepipeline-1.24.36.post1-py
3-none-any.whl": "d57161a323a939856d0a1502ae2c0d5fabb40225ec2fc4f711ddee7df05b11c0", "mypy_boto3_cognito_idp-1.24.66-py3-none-any.whl": "cc3fd035763d101c8a314cbc4e601ebc4da2d6656af878171651d1ed7a6cf026", "mypy_boto3_dynamodb-1.24.74
-py3-none-any.whl": "3d8dfbb285ca8ae94bced6ca19755ecda50336ba1d1f2403f85af2f79610764a", "mypy_boto3_kms-1.24.36.post1-py3-none-any.whl": "037c47ea0105053f69f512f8cac6b5186930711b1371e9c6eca558fcdc147fe4", "mypy_boto3_logs-1.24.36.po
st1-py3-none-any.whl": "405f14bf922e6ebb0a1c6687eb5cf0ca5f8a97b71c33c9b989a14fc6383987af", "mypy_boto3_ram-1.24.36.post1-py3-none-any.whl": "ad79d703b8296cbf5a58e1a1d28db00b41fee3b189782712d4e73cf28d46741e", "mypy_boto3_rds-1.24.61-
py3-none-any.whl": "e1225b2447ca2767a06391ce100c15eb1f7ab17a6ac985e5010517afd245d8b5", "mypy_boto3_route53-1.24.69-py3-none-any.whl": "5faffd4824fc5a60ee9d3be8f9d431f182f1f7c6e1e388502fb82dd25ac1c4c0", "mypy_boto3_s3-1.24.76-py3-non
e-any.whl": "3195a19a905ab2a1e1b3b3a0bf41454647b3d89a6f24e9904d8786faf3336850", "mypy_boto3_secretsmanager-1.24.0-py3-none-any.whl": "ad6620757442061f1b769067bb30c9ba8dda52a476fdc53207f173f8fa91c9c2", "mypy_boto3_ssm-1.24.69-py3-non
e-any.whl": "afbcd8bfee68be6650897eddd3b0da0ee6a3702ce2702b741969ce0e0eaa6476", "mypy_boto3_stepfunctions-1.24.36.post1-py3-none-any.whl": "8a33cbc633796114d09de1f5d71af0bb663e908aea8cf4dd4ae668916571d4be", "mypy_boto3_sts-1.24.36.p
ost1-py3-none-any.whl": "4349d4591c6527395597f25aef3fdca22269f110cbf180599b3d3a6b524eb6b2", "python_dateutil-2.8.1-py2.py3-none-any.whl": "76bf9515aceadeed0873f05d0223e3b554cf7e1452461b1f62fd87599ea28a38", "s3transfer-0.6.0-py3-none
-any.whl": "be056641510d7effc911aad62944b4daf8b26281d992a395feb7e1046ed023b3", "simplejson-3.17.2-cp310-cp310-linux_x86_64.whl": "e639773066ad14cef73d27b6f27a964b0e4fc4b1f35aab1e204286383c00cb11", "six-1.16.0-py2.py3-none-any.whl":
"3e1c439c88d2e7681372427bab751b3fc99969891e95a714fed9604bf7710213", "types_s3transfer-0.6.0.post4-py3-none-any.whl": "93ca02d6d6614915f2dd5d0e0eeca467f0f9548cbc8ecc2c1621d7bc275168b3", "typing_extensions-4.1.1-py3-none-any.whl": "f3
3d3f74e8b0ba573b34679fa134a041545eefbb6302ba5a7bead438c67ab2da", "urllib3-1.26.8-py2.py3-none-any.whl": "4d91ca5966ab5eca04245708da5fb86b4afd54c2d6a7977cc34da7f08828770b"}, "emit_warnings": false, "entry_point": "<http://cloud.deployment.cl|cloud.deployment.cl>
ient_session_tagger.handler", "ignore_errors": false, "includes_tools": true, "inherit_path": "false", "interpreter_constraints": [], "pex_hash": "bc3d7c41725c1413b7e65b5b1de2e49147edc5e2", "pex_path": null, "requirements": ["awslam
bdaric==2.0.1", "boto3-stubs[batch,codeartifact,codebuild,codepipeline,cognito-idp,dynamodb,kms,logs,ram,rds,route53,s3,secretsmanager,ssm,stepfunctions,sts]==1.24.78", "boto3==1.24.51", "injector==0.20.1"], "strip_pex_env": true, "
venv": false, "venv_bin_path": "false", "venv_copies": false, "venv_site_packages_copies": false}%
e
Thank you! I'll need a bit here to come up with a repro and pinpoint the slowness, but I'll report back with - I suspect - a Pex issue link.
👍 1
w
Is there some way I can just tell pants to not pack these in the pex_binaries? I mean, they're just stubs, by definition we can live without them at runtime 🙂
transitive excludes?
e
Perhaps, yeah. That could tide you over until a bug fix in Pex is forthcoming.
✔️ 1
I have reproed by the way, the issue is in fact some sort of quadratic behavior analysing dependencies due to `boto3-stubs[batch,codeartifact,codebuild,codepipeline,cognito-idp,dynamodb,kms,logs,ram,rds,route53,s3,secretsmanager,ssm,stepfunctions,sts]==1.24.78`changing that to
boto3-stubs==1.24.78
yields a zip of the same size but the analysis drops from ~70s on my machine to ~300ms.
w
I dont think
boto3-stubs
yields the same stubs that the lhs expression does
Ah, you didn't mean that either, sorry, it's late
e
Right. I mean this: https://github.com/pantsbuild/pex/blob/ecf9b95544c53970628e982b7132a345bbc931cd/pex/environment.py#L195-L208 That code, in your case, takes in 16 extras which leads to a powerset size of 65k - that alone is not enough to explain this, but I think it's at the core.
✔️ 1
This actually turned out to just be an expensive hash code calculation on an already immutable object. Caching the hash drops 70s to 700ms and there is the 2 orders of magnitude from earlier in the day ~exactly.
🙌 2
A few technical difficulties slowed this down but I now have a working branch. I'll get out a Pex fix tomorrow: https://github.com/pantsbuild/pex/issues/1928
1
Alright Pex 2.1.108 is released with the fix and the Pants upgrade is here: https://github.com/pantsbuild/pants/pull/17112 That will be in the 2.15.x series @worried-painter-31382. Presumably you're on older / more stable Pants. I'll arrange a cherry-pick to 2.14.x but speak up if you're not on 2.14.x or not in a position to upgrade to it to get the fix. You can change your installations Pex version by setting some options in this subsystem that I can explain more if needed: https://www.pantsbuild.org/docs/reference-pex-cli
Actually, this is also getting picked to 2.13.x.
w
Awesome. Yea we're on "stable" for compliance reasons, but configuring subsystems I am familiar with, we run pants "offline" :)