There have been some folks building dockerized Pyt...
# general
w
There have been some folks building dockerized Python+FastAPI projects with Pants here and sharing tips. I've been following some of these tips to try to accomplish this myself but have hit a block when it comes to packaging the Docker image with Pants. It's hard to share all the configurations here in Slack so I created a minimal example project that shows my work and demonstrates the problem I'm having: https://github.com/davidbeers/PantsFastApiExample. I'd like to get this project fixed so I can have success with Pants and hopefully the public repo will then be of use to others as well. The specific problem is with building the Docker image with
pants package src/docker/Dockerfile
. The full output including the error looks like this:
Copy code
11:44:08.76 [INFO] Initializing scheduler...
11:44:11.16 [INFO] Scheduler initialized.
11:44:13.90 [INFO] Completed: Building src.python/api-binary-srcs.pex
11:44:14.71 [INFO] Completed: Building docker image hello-api:latest
11:44:14.71 [ERROR] 1 Exception encountered:

Engine traceback:
  in `package` goal

ProcessExecutionFailure: Process 'Building docker image hello-api:latest' failed with exit code 1.
stdout:

stderr:
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 390B done
#1 DONE 0.0s

#2 [internal] load metadata for <http://docker.io/library/python:3.11-slim-bookworm|docker.io/library/python:3.11-slim-bookworm>
#2 DONE 0.0s

#3 [internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s

#4 [internal] load build context
#4 transferring context: 1.19MB done
#4 DONE 0.0s

#5 [srcs 1/3] FROM <http://docker.io/library/python:3.11-slim-bookworm|docker.io/library/python:3.11-slim-bookworm>
#5 CACHED

#6 [srcs 2/3] COPY src.python/api-binary-srcs.pex /
#6 DONE 0.0s

#7 [srcs 3/3] RUN PEX_TOOLS=1 /usr/local/bin/python3.11 api-binary-srcs.pex venv --scope=srcs
#7 0.287 /usr/local/bin/python3.11: can't open file '//api-binary-srcs.pex': [Errno 2] No such file or directory
#7 ERROR: process "/bin/sh -c PEX_TOOLS=1 /usr/local/bin/python3.11 api-binary-srcs.pex venv --scope=srcs" did not complete successfully: exit code: 2
------
 > [srcs 3/3] RUN PEX_TOOLS=1 /usr/local/bin/python3.11 api-binary-srcs.pex venv --scope=srcs:
0.287 /usr/local/bin/python3.11: can't open file '//api-binary-srcs.pex': [Errno 2] No such file or directory
------
Dockerfile:7
--------------------
   5 |     FROM python:3.11-slim-bookworm as srcs
   6 |     COPY src.python/api-binary-srcs.pex /
   7 | >>> RUN PEX_TOOLS=1 /usr/local/bin/python3.11 api-binary-srcs.pex venv --scope=srcs
   8 |     
   9 |     ENTRYPOINT ["/app/pex"]
--------------------
ERROR: failed to solve: process "/bin/sh -c PEX_TOOLS=1 /usr/local/bin/python3.11 api-binary-srcs.pex venv --scope=srcs" did not complete successfully: exit code: 2
The "can't open file" error could be related to the fact that
api-binary-srcs.pex
is a directory in /dist/src.python rather than a file. The /dist dir looks like the attached screenshot. Initially I was confused because I thought the error was occurring after the 3rd party dependencies layer build had succeeded which happens by running the same PEX_TOOLS command as the failing one. But looking closely at the full output I can see that's not the case: for some reason the srcs are getting built first even though they come later in the Dockerfile. Perhaps because of the multiple FROM directives? In any case, I'm probably making a relatively simple Docker error and doubt this is a problem with Pants. I appreciate the input of @happy-kitchen-89482, @wide-midnight-78598 and @better-van-82973 in another thread and hope the public repo makes it very easy for anyone to point out my mistake. My goals: • to have a docker image with layers for the infrequently changing 3rd party dependencies and the frequently changing 1st party source to improve image build performance. • I would like not to compile the pex into .pyc since the small performance benefit in my case doesn't seem worth the cost of not having a deterministic, reproducible build. But if it's easy to set the project up so it's trivial to switch between compiled and uncompiled pex, that's great and I'll gladly document it. • The container is to run on AWS Fargate with a load balancer handling scaling at the container level so I'd like the container to run plain uvicorn without gunicorn. I think this would be similar for folks deploying FastAPI on Kubernetes.
w
Don't have much time to look at this right now, but what's going on here?
Copy code
# 3rd party deps
pex_binary(
    name="api-binary-deps",
    environment="docker_linux",
    layout="packed",
    execution_mode="venv",
    include_sources=False,
    include_tools=True,
)

# 1st party srcs
pex_binary(
    name="api-binary-srcs",
    entry_point="api.main",
    environment="docker_linux",
    layout="packed",
    execution_mode="venv",
    include_requirements=False,
    include_tools=True,
)
w
I'm building two pex binaries: one for the 3rd party dependencies that excludes the first party source and one for the source that excludes the 3rd party dependencies. This way I hope to have a multi-layered Docker image. Appreciate even the quick look, @wide-midnight-78598, and don't expect folks to drop everything to solve this for me!
w
np - I'm in a boring meeting, which ends soon - then I get back to real work 🙂
So - let me copy/paste the simplest configuration I have, which works great
🙏 1
The two pexes is weird
https://github.com/pantsbuild/pex/pull/1634 I think this is what you were going for
It's built-in
b
The two pexes is a well-worn approach if you’re deploying using Docker: https://blog.pantsbuild.org/optimizing-python-docker-deploys-using-pants/
w
Gotcha, trying to optimize using the end of Josh's blog, but I think first it's to get it reliably running 🙂
Problem is, in the dockerfile - they're not recombinator'd anywhere?
Dockerfile: Note the 3
FROM
statements - first two are cached, third is the one we run where everything was pulled together
Copy code
# Following instructions here for reduced startup time and smaller footprint
# <https://pex.readthedocs.io/en/latest/recipes.html?ref=blog.pantsbuild.org#pex-app-in-a-container>
# There are still some more optimizations that could be done, see below:
# <https://blog.pantsbuild.org/optimizing-python-docker-deploys-using-pants/>

FROM python:3.11-slim as deps
COPY backend.admin/adminapi-pex.pex /api.pex
RUN PEX_TOOLS=1 /usr/local/bin/python3.11 /api.pex venv --scope=deps --compile /bin/app

FROM python:3.11-slim as srcs
COPY backend.admin/adminapi-pex.pex /api.pex
RUN PEX_TOOLS=1 /usr/local/bin/python3.11 /api.pex venv --scope=srcs --compile /bin/app

FROM python:3.11-slim
COPY --from=deps /bin/app /bin/app
COPY --from=srcs /bin/app /bin/app

# Using the same entrypoint as the Azure App Service variant
# Look into the number of cores
EXPOSE 8000
ENTRYPOINT ["/bin/app/bin/gunicorn", "admin.main:app", "--bind=0.0.0.0", "--timeout", "600", "--forwarded-allow-ips=*", "-k", "uvicorn.workers.UvicornWorker", "-w", "1"]
BUILD
Copy code
python_sources(
    name="libadminapi",
    sources=["**/*.py", "!*_test.py"],
    dependencies=[
        "//:reqs#aiohttp",
        "//:reqs#psycopg2-binary",
        "//:reqs#python-multipart",
    ],
)

pex_binary(
    name="adminapi-pex",
    dependencies=[
        ":libadminapi",
        "//:reqs#uvicorn",
        "//:reqs#gunicorn",
    ],
    include_tools=True,
    platforms=[
        "linux-x86_64-cp-311-cp311",
        "macosx-13.3-arm64-cp-311-cp311",
        "macosx-13.3-x86_64-cp-311-cp311",
    ],
)

docker_image(
    name="dockerized_admin",
    image_tags=["latest"],
    registries=["redacted.azurecr.io"],
    repository="adminapi",
    skip_push=True,
)
@wide-journalist-72152 I hope I didn't leak anything from any of my clients, but essentially, that's my not-fully-optimized version, which runs well, stress free.
I have a couple things just for me in there - and I renamed a couple items, so hopefully they sync up. I have
skip_push
because I'm pushing via some other mechanism that isn't worth discussing.
Key point is: This is the part of the dockerfile that I actually run from, the previous two are cached layers
Copy code
FROM python:3.11-slim
COPY --from=deps /bin/app /bin/app
COPY --from=srcs /bin/app /bin/app
Then I use gunicorn and uvicorn to run on Azure's crappy infra
My "actual" deployments use
scie
files, and custom interpreters and whatever - but I think this is a good 80/20 above Also, I could have sworn we have something like that in our docs somewhere, rather than needing to jump between websites This gets asked a decent amount
Aside: Very quick glance:
Copy code
RUN PEX_TOOLS=1 /usr/local/bin/python3.11 /api-binary-deps.pex venv --scope=deps

FROM python:3.11-slim-bookworm as srcs
COPY src.python/api-binary-srcs.pex /
RUN PEX_TOOLS=1 /usr/local/bin/python3.11 api-binary-srcs.pex venv --scope=srcs
Any reason why the deps has a slash prefix, and the srcs doesnt? Intentional?
Anyways, before digging into the 3 goals you have, let's get the most basic incarnation working, and then expand out into the more highly optimized cache, removing gunicorn, etc
w
Thanks. I saw that example you posted earlier and frankly didn't understand it since it looks like the two layers would be identical, including both 3rd party and 1st party code. Presumably my lack of understanding of the
as deps
and
as srcs
. Are those like keywords that somehow filter the build context to include only 3rd party in the first case and only 1st party in the second? I can't find documentation on this aside from the sample code here. (But hey, I'll try it like a blind man being nudged away from the curb!) I was borrowing from @better-van-82973’s example there where he didn't seem to need that explicit recombination: https://pantsbuild.slack.com/archives/C046T6T9U/p1707342010071579?thread_ts=1690550728.554309&amp;cid=C046T6T9U
SJ wrote: > Any reason why the deps has a slash prefix, and the srcs doesnt? Intentional? Oops... well that was from some experimentation to see if it fixed the specific error on that line. It's not correct as far as I can tell. Again: I don't have a doc for that command that I can find.
w
Krishnan's is the smarter version of mine 🙂 Just more optimized - mines a bit simple, until everything works
b
So I managed to get your Dockerfile to work with a couple minor changes - I think what is actually breaking this is copying your PEX files to
/
instead of to a dedicated path:
Copy code
-COPY src.python/api-binary-deps.pex /
-RUN PEX_TOOLS=1 /usr/local/bin/python3.11 /api-binary-deps.pex venv --scope=deps
+
+COPY src.python/api-binary-deps.pex /api-binary-deps.pex
+RUN PEX_TOOLS=1 /usr/local/bin/python3.11 api-binary-deps.pex venv --scope=deps --compile /app

 FROM python:3.11-slim-bookworm as srcs
-COPY src.python/api-binary-srcs.pex /
-RUN PEX_TOOLS=1 /usr/local/bin/python3.11 api-binary-srcs.pex venv --scope=srcs
+COPY src.python/api-binary-srcs.pex /api-binary-srcs.pex
+RUN PEX_TOOLS=1 /usr/local/bin/python3.11 api-binary-srcs.pex venv --scope=srcs --compile /app
(You can ignore the --compile restored at the end, that was just for me to verify that that worked too)
👀 1
w
Presumably my lack of understanding of the
as deps
and
as srcs
.
Are those like keywords that somehow filter the build context to
include only 3rd party in the first case and only 1st party in the
second?
Yeah, this is getting a bit cargo cult-y Docker's multi-stage, named layers https://docs.docker.com/build/building/multi-stage/#name-your-build-stages And then pex's use of
--scope
to split out deps and application code, for better layers https://pex.readthedocs.io/en/latest/recipes.html#pex-app-in-a-container
👍 1
I managed to get your Dockerfile to work with a couple minor changes - I
think what is actually breaking this is copying your PEX files to
/
instead of to a dedicated path:
I was wondering about this too - I hadn't had a chance to compile, but I noticed that it wasn't pointed anywhere in particular.
b
The COPY instruction in Docker has some really confusing behaviors: https://docs.docker.com/engine/reference/builder/#copy I try to avoid the edge cases wherever possible 🙂
👍 1
w
I think what is actually breaking this is copying your PEX files to
/
instead of to a dedicated path:
Ah thanks for that! It takes care of the error mentioned above. But it doesn't fix the docker build for me for some reason, just moves me to the next error:
Copy code
> [srcs 3/3] RUN PEX_TOOLS=1 /usr/local/bin/python3.11 api-binary-srcs.pex venv --scope=srcs:
0.993                                               [--non-hermetic-scripts]
0.993                                               [--rm {pex,all}]
0.993                                               [--emit-warnings]
0.993                                               [--pex-root PEX_ROOT]
0.993                                               [--disable-cache]
0.993                                               [--cache-dir CACHE_DIR]
0.993                                               [--tmpdir TMPDIR]
0.993                                               [--rcfile RC_FILE]
0.993                                               PATH
0.993 PEX_TOOLS=1 ./api-binary-srcs.pex venv: error: the following arguments are required: PATH
w
RUN PEX_TOOLS=1 /usr/local/bin/python3.11 api-binary-srcs.pex venv --scope=srcs  /app
? @wide-journalist-72152 Did you add the
/app
at the end?
👍 2
w
Missed it, thanks... sigh... copy pasta. That fixes the build. But it looks like the 3rd party dependencies are still not getting built into the image. When I run in a container using `pants run src/docker/Dockerfile I get this on the first import:
Copy code
File "/app/lib/python3.11/site-packages/api/main.py", line 1, in <module>
    from fastapi import FastAPI
ModuleNotFoundError: No module named 'fastapi'
w
Did you copy them into the same container?
b
That’s an easier issue - your deps PEX needs to say what exactly you are getting the dependencies for:
Copy code
pex_binary(
    name="api-binary-deps",
    # You need an entry_point or something here so Pants knows what dependencies you need
    environment="docker_linux",
    layout="packed",
    execution_mode="venv",
    include_sources=False,
    include_tools=True,
)
w
I use something like this:
Copy code
pex_binary(
    name="adminapi-pex",
    dependencies=[
        ":libadminapi",
        "//:reqs#uvicorn",
        "//:reqs#gunicorn",
    ],
But i guess it's different with the dual split pexes?
b
Yeah, ^ is pretty much what I use as well:
Copy code
pex_binary(
    name="binary-deps",
    entry_point="main.py",
    dependencies=[
        ":lib",
        "api/src/py/api/scripts:lib",
    ],
    layout="packed",
    environment="linux_docker",
    execution_mode="venv",
    include_sources=False,
    include_tools=True,
)
w
What are these pointing to in your project, @better-van-82973?
Copy code
":lib",
        "api/src/py/api/scripts:lib",
b
Those are
python_sources
targets which are used for dependency inference - the dependencies that get packaged into the deps PEX are the dependencies of source files in those directories.
:lib
refers to sources in the same directory as the BUILD file
w
The
python_sources
target I want the
api-binary-deps
to use for dependency inference is the same directory as the BUILD file, so I gave the name "lib" to the python_sources in that BUILD file and added that as a dependency to the api-binary-deps pex_binary target like so:
Copy code
python_sources(
    name="lib"
)

# 3rd party deps
pex_binary(
    name="api-binary-deps",
    environment="docker_linux",
    dependencies=[
        ":lib",
    ],
    layout="packed",
    execution_mode="venv",
    include_sources=False,
    include_tools=True,
)

# 1st party srcs
pex_binary(
    name="api-binary-srcs",
    entry_point="api.main",
    environment="docker_linux",
    layout="packed",
    execution_mode="venv",
    include_requirements=False,
    include_tools=True,
)
That didn't seem to do anything: still ModuleNotFoundError on the first import. I shouldn't need to add dependencies to the
python_sources
in this case since Pants can infer that from the imports, right? What am I missing?
I continue to be immensely thankful for your attention on this. So generous! I'm pushing these changes we discuss to the github repo as I go, by the way, if it helps to look at my current code.
b
The issue you have now is that the
:lib
target refers to Python files in the same directory as the BUILD file - and that’s only an empty
__init__.py
file: https://github.com/davidbeers/PantsFastApiExample/tree/main/src/python You’ll have to change that dependency target to point at your app code - will leave that as an exercise for the reader 🙂
😅 1
w
Ah! 🫠 Noted and fixed. Also leveled up on multistage images and assembled the two cached layers into one final one following @wide-midnight-78598. Also went ahead and gunicorned the project for now to match @wide-midnight-78598's and reduce the number of differences from a working repo. Pants builds the image, runs the container, and shows the server starting up all without error, but the server is still inaccessible from the browser. Seems like I'm pretty close now. I'm reading docs to troubleshoot, but if anyone has a suggestion about this issue, let me know. Latest code: https://github.com/davidbeers/PantsFastApiExample
w
Are you running on any particular ports?
Without looking into it - if everything runs as expected, you might be in a code/docker situation, rather than a pants situation
try 8080 or 8000 if you haven’t picked anything
I dont think Docker can use the < 1024 ports without some admin/root shenanigans
w
You may be right that it's a code/docker situation now, but I've never encountered this problem when building Docker images for FastAPI without Pants. If I let uvicorn and gunicorn stay on the default 8000 port and EXPOSE the same port from the Dockerfile it's still inaccessible. I could be missing some gunicorn config now that I've added that. I'll poke around in @happy-kitchen-89482’s Django example for clues.
w
Is there anything in the logs? Any errors? Not really much to action. I cloned the repo, built the pexes, and then ran
./bin/gunicorn api.main:app --bind=0.0.0.0 -k uvicorn.workers.UvicornWorker -w 1
after extracting the pexes, and it works fine. I dont have docker on this machine, so can't test that part
b
EXPOSE is a setting on the image, you still have to port-map it onto the local network to make it accessible: https://forums.docker.com/t/expose-in-dockerfile-vs-docker-run-p/79568 How are you running the Docker container?
🎉 1
w
Ah thank you. That was the last bit! I'd left off the port mapping. if I run
docker run -p 80:8000 hello-api
it works!
w
👍
w
You guys are heros! I've learned a lot.
w
If you have anywhere where you'd like to draft up and post your learnings, I'm sure the community would be grateful
👍 1
w
I will. For one thing I'll document the code in that repo. I don't have a blog right now, but I could use the GitHub wiki attached to the repo.
w
👍 I've just tabled an item for our upcoming meeting to determine where the best place to point people for common "recipes" would be. Full blown example repos I think have too much overhead, for what they often are - unless we have big monorepos of common workflows
I post what I can in my personal blog, but that's not first-party