Hi all - we're evaluating pants as part of a move ...
# general
p
Hi all - we're evaluating pants as part of a move from a polyrepo solution to a monorepo for a number of Python services. We currently use GitHub actions with Megalinter, then build docker images with cloudbuild. I don't think we're ready to make the jump to
pex
yet, especially for services where we have multiple scripts in the same project and just want a docker image similar to what we already have for each project. Are there any docs about migrating our existing dockerfiles to work with pants, specifically around copying paths and setting up dependencies?
An example Dockerfile for a current project would be
Copy code
FROM python:3.12-slim

ENV PYTHONUNBUFFERED=1
ENV PYTHONPATH=/app:/app/libraries

# Install Postgres dependencies
RUN apt-get update && apt-get install -y --no-install-recommends libpq5 && rm -rf /var/lib/apt/lists/*

# Install pip requirements
WORKDIR /app
COPY requirements.txt .
RUN python -m pip install --no-cache-dir -r requirements.txt && rm requirements.txt

# Install and compile app components
COPY bin/* /app/bin/
COPY libraries/ /app/libraries/
COPY templates/ /app/templates/
COPY usage_instructions.py /app/

RUN adduser -u 10000 --disabled-password --gecos "" appuser && chown -R appuser /app
USER appuser

CMD ["python3", "usage_instructions.py"]
e
Honestly, I would consider making the jump to
pex
. It doesn't look like you have a lot going on in your dockerfile that would give you too much trouble. 1. Put everything that's not "copy files" or "pip install" into a Dockerfile as a base image. 2. Write lots of `pex_binary`/`docker_image` pairs like:
Copy code
pex_binary(name="module_pex"), entry_point="usage_instructions.py")
docker_image(name="module_image", instructions=[
    "ARG BASE_IMAGE=/path/to/base:image",
    "FROM $BASE_IMAGE",
    "COPY src.module.module_pex /app/app.pex", # I probably got the generated name of the pex wrong here, but you get the idea
]
3. Wrap this up in a macro so your build files can just look like
python_docker_image(name=<abc>, entry_point=/path/to/usage_instructions.py"
4. Follow https://www.pantsbuild.org/blog/2022/08/02/optimizing-python-docker-deploys-using-pants#multi-stage-build-leveraging-2-pexs if you want cache-optimized images
p
This particular example has about 20 scripts in the bin directory right now - so would we need one
pex_binary
for each of those?
e
How does your image usually get run? If the CMD is to run "usage_instructions.py", making a pex with that file as entry_point should (by dependency inference) get you everything you need.
p
This particular image is used in multiple k8s deployments, each with a
command
override to specify the script to run for that deployment
e.g.
Copy code
command:
  - /usr/local/bin/python3
  - bin/this_thing.py
some are k8s cronjobs
e
is
/bin
just a collection of various python script entrypoints? or do you have non-python starters in there too?
p
this is python scripts only
e
Just a guess here, (since I've noticed some of the other teams at my workplace have a situation like this), but do you have a pattern of "COPY everything needed by anything in the project into one "super-image", and then use the same image for every k8s deployment, just with a different CMD"
p
for this project, yes
Other projects that we'll move into the monorepo are more traditional applications with a single entrypoint
e
alright. The other traditional ones would probably convert fairly smoothly with the steps I mentioned above. For this one, with lots of entrypoints, you would have to be a little more specific, but creating a
pex_binary
for each entrypoint, and then copying each of them into the docker image would work (functionality-wise), though you'll probably end up with a number of common files copied into each pex, and a bit of image size bloat.
p
yeah - the bloat is something I'm keen to avoid
also build time bloat 🙂
e
Honestly though, I'd consider this a decent time to change your docker image into a number of more specific images (ie. per entrypoint, since this matches the usage pattern) As long as you don't have: 1. dynamic imports that read the filesystem and import things 2. python files shelling out to other tools (including other python scripts) You should just be able to get away with making a docker image out of each entry point, (with appropriate changes to your k8s layer). (If you do have that kind of dependency, you'll need to set those dependencies manually per-file in the appropriate
python_sources
targets I know my team's situation was basically "figuring out and mapping the subset of the repo needed by each entrypoint is too difficult" and that's why we had the "super-image" pattern. But pants' dependency inference makes that all go away.
In our case, we ended up reducing the image size by ~30% just based on pants being able to copy only the files needed by a particular entry point (including 3rd party libs... no more putting numpy in every container)
b
As far as I understand, you can also build with pex, then expand back into a venv during the image build. See: https://www.pantsbuild.org/blog/2022/08/02/optimizing-python-docker-deploys-using-pants I am not sure though whether this would make you multiple entrypoints available and is aligned with current bests practice.
👀 1
e
I think this will do what you are looking for, but the difficulty will be in making sure there is a single dependency tree that connects everything
p
The other option I'm looking at is moving the scripts to modules and having a script that calls them depending on modules
e
like you mean making each script importable so it can be run via python instead of via shell? If you do that and then set your pex entry point to a file like
Copy code
from package.of.all.entry_points import A, B, C

entry = os.env.get(ENTRYPOINT, None)

match entry:
    case "A": A.main()
    case "B": B.main()
    case "C": C.main()
    default: raise ValueError(f"Unrecognized entrypoint: '{entry}')
Pants will be able to infer all the dependencies because they are all imported in python
p
roughly that, yeah
with some mangling of argparse to do the right thing for the right script
e
yeah, that seems pretty sound