Hi all we re evaluating pants as part of a move from a polyr Pants #general

Hi all - we're evaluating pants as part of a move ...

proud-policeman-38871

09/13/2024, 9:30 AM

Hi all - we're evaluating pants as part of a move from a polyrepo solution to a monorepo for a number of Python services. We currently use GitHub actions with Megalinter, then build docker images with cloudbuild. I don't think we're ready to make the jump to

pex

yet, especially for services where we have multiple scripts in the same project and just want a docker image similar to what we already have for each project. Are there any docs about migrating our existing dockerfiles to work with pants, specifically around copying paths and setting up dependencies?

proud-policeman-38871

09/13/2024, 9:30 AM

An example Dockerfile for a current project would be

Copy code

FROM python:3.12-slim

ENV PYTHONUNBUFFERED=1
ENV PYTHONPATH=/app:/app/libraries

# Install Postgres dependencies
RUN apt-get update && apt-get install -y --no-install-recommends libpq5 && rm -rf /var/lib/apt/lists/*

# Install pip requirements
WORKDIR /app
COPY requirements.txt .
RUN python -m pip install --no-cache-dir -r requirements.txt && rm requirements.txt

# Install and compile app components
COPY bin/* /app/bin/
COPY libraries/ /app/libraries/
COPY templates/ /app/templates/
COPY usage_instructions.py /app/

RUN adduser -u 10000 --disabled-password --gecos "" appuser && chown -R appuser /app
USER appuser

CMD ["python3", "usage_instructions.py"]

elegant-florist-94385

09/13/2024, 9:47 AM

Honestly, I would consider making the jump to

pex

. It doesn't look like you have a lot going on in your dockerfile that would give you too much trouble. 1. Put everything that's not "copy files" or "pip install" into a Dockerfile as a base image. 2. Write lots of `pex_binary`/`docker_image` pairs like:

Copy code

pex_binary(name="module_pex"), entry_point="usage_instructions.py")
docker_image(name="module_image", instructions=[
    "ARG BASE_IMAGE=/path/to/base:image",
    "FROM $BASE_IMAGE",
    "COPY src.module.module_pex /app/app.pex", # I probably got the generated name of the pex wrong here, but you get the idea
]

3. Wrap this up in a macro so your build files can just look like

python_docker_image(name=<abc>, entry_point=/path/to/usage_instructions.py"

4. Follow https://www.pantsbuild.org/blog/2022/08/02/optimizing-python-docker-deploys-using-pants#multi-stage-build-leveraging-2-pexs if you want cache-optimized images

proud-policeman-38871

09/13/2024, 9:53 AM

This particular example has about 20 scripts in the bin directory right now - so would we need one

pex_binary

for each of those?

elegant-florist-94385

09/13/2024, 9:55 AM

How does your image usually get run? If the CMD is to run "usage_instructions.py", making a pex with that file as entry_point should (by dependency inference) get you everything you need.

proud-policeman-38871

09/13/2024, 9:56 AM

This particular image is used in multiple k8s deployments, each with a

command

override to specify the script to run for that deployment

proud-policeman-38871

09/13/2024, 9:57 AM

e.g.

Copy code

command:
  - /usr/local/bin/python3
  - bin/this_thing.py

proud-policeman-38871

09/13/2024, 9:57 AM

some are k8s cronjobs

elegant-florist-94385

09/13/2024, 9:58 AM

/bin

just a collection of various python script entrypoints? or do you have non-python starters in there too?

proud-policeman-38871

09/13/2024, 9:58 AM

this is python scripts only

elegant-florist-94385

09/13/2024, 10:00 AM

Just a guess here, (since I've noticed some of the other teams at my workplace have a situation like this), but do you have a pattern of "COPY everything needed by anything in the project into one "super-image", and then use the same image for every k8s deployment, just with a different CMD"

proud-policeman-38871

09/13/2024, 10:00 AM

for this project, yes

proud-policeman-38871

09/13/2024, 10:01 AM

Other projects that we'll move into the monorepo are more traditional applications with a single entrypoint

elegant-florist-94385

09/13/2024, 10:04 AM

alright. The other traditional ones would probably convert fairly smoothly with the steps I mentioned above. For this one, with lots of entrypoints, you would have to be a little more specific, but creating a

pex_binary

for each entrypoint, and then copying each of them into the docker image would work (functionality-wise), though you'll probably end up with a number of common files copied into each pex, and a bit of image size bloat.

proud-policeman-38871

09/13/2024, 10:04 AM

yeah - the bloat is something I'm keen to avoid

proud-policeman-38871

09/13/2024, 10:07 AM

also build time bloat 🙂

elegant-florist-94385

09/13/2024, 10:09 AM

Honestly though, I'd consider this a decent time to change your docker image into a number of more specific images (ie. per entrypoint, since this matches the usage pattern) As long as you don't have: 1. dynamic imports that read the filesystem and import things 2. python files shelling out to other tools (including other python scripts) You should just be able to get away with making a docker image out of each entry point, (with appropriate changes to your k8s layer). (If you do have that kind of dependency, you'll need to set those dependencies manually per-file in the appropriate

python_sources

targets I know my team's situation was basically "figuring out and mapping the subset of the repo needed by each entrypoint is too difficult" and that's why we had the "super-image" pattern. But pants' dependency inference makes that all go away.

elegant-florist-94385

09/13/2024, 10:10 AM

In our case, we ended up reducing the image size by ~30% just based on pants being able to copy only the files needed by a particular entry point (including 3rd party libs... no more putting numpy in every container)

breezy-twilight-65275

09/13/2024, 12:26 PM

As far as I understand, you can also build with pex, then expand back into a venv during the image build. See: https://www.pantsbuild.org/blog/2022/08/02/optimizing-python-docker-deploys-using-pants I am not sure though whether this would make you multiple entrypoints available and is aligned with current bests practice.

👀 1

elegant-florist-94385

09/13/2024, 1:56 PM

I think this will do what you are looking for, but the difficulty will be in making sure there is a single dependency tree that connects everything

proud-policeman-38871

09/13/2024, 1:57 PM

The other option I'm looking at is moving the scripts to modules and having a script that calls them depending on modules

elegant-florist-94385

09/13/2024, 2:03 PM

like you mean making each script importable so it can be run via python instead of via shell? If you do that and then set your pex entry point to a file like

Copy code

from package.of.all.entry_points import A, B, C

entry = os.env.get(ENTRYPOINT, None)

match entry:
    case "A": A.main()
    case "B": B.main()
    case "C": C.main()
    default: raise ValueError(f"Unrecognized entrypoint: '{entry}')

elegant-florist-94385

09/13/2024, 2:03 PM

Pants will be able to infer all the dependencies because they are all imported in python

proud-policeman-38871

09/13/2024, 2:03 PM

roughly that, yeah

proud-policeman-38871

09/13/2024, 2:03 PM

with some mangling of argparse to do the right thing for the right script

elegant-florist-94385

09/13/2024, 2:04 PM

yeah, that seems pretty sound

6 Views

Open in Slack

Previous Next