Hi :slightly_smiling_face: I am trying to run a Dj...
# general
t
Hi šŸ™‚ I am trying to run a Django management command from airflow. Locally this works without any problems, where I use this target:
Copy code
python_source(
    name="entrypoint",
    source="entrypoint.py",
    restartable=True,
    dependencies=[
        ":dags",
        "django/django_core/settings.py",
        "django/cms/utils/utils.py",
    ],
)
with a sandbox that looks like this:
Copy code
vscode āžœ /tmp/pants-sandbox-pFVy1u $ ls -la
total 240
drwxr-xr-x   8 vscode vscode   4096 Jul  4 12:16 .
drwxrwxrwt 757 root   root   167936 Jul  4 12:17 ..
drwxr-xr-x   2 vscode vscode   4096 Jul  4 12:16 .cache
lrwxrwxrwx   1 vscode vscode    103 Jul  4 12:16 .python-build-standalone -> /tmp/immutable_inputsoBPqvS/.tmpYNb0Ug/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
-rwxr-xr-x   1 vscode vscode  13167 Jul  4 12:16 __run.sh
drwxr-xr-x   5 vscode vscode   4096 Jul  4 12:16 airflow
drwxr-xr-x  18 vscode vscode   4096 Jul  4 12:16 django
drwxr-xr-x   4 vscode vscode   4096 Jul  4 12:16 entrypoint.pex
-rwxr-xr-x   1 vscode vscode   8770 Jul  4 12:16 entrypoint.pex_bin_python_shim.sh
-rwxr-xr-x   1 vscode vscode   8763 Jul  4 12:16 entrypoint.pex_pex_shim.sh
drwxr-xr-x   4 vscode vscode   4096 Jul  4 12:16 jobs
drwxr-xr-x   4 vscode vscode   4096 Jul  4 12:16 libs
Now, I deploy this as a PEX within a docker image that looks like this:
Copy code
pex_binary(
    name="airflow",
    entry_point="entrypoint.py",
    restartable=True,
    layout="packed",
    execution_mode="venv",
    include_tools=True,
    dependencies=[":entrypoint"],
)
and the Dockerfile:
Copy code
ARG AIRFLOW_VERSION="2.5.3"
ARG PYTHON_MAJOR_VERSION

# <https://blog.pantsbuild.org/optimizing-python-docker-deploys-using-pants/>
FROM apache/airflow:slim-${AIRFLOW_VERSION}-python${PYTHON_MAJOR_VERSION} as deps
ARG PYTHON_MAJOR_VERSION
COPY airflow/airflow.pex /airflow.pex
USER root
# If `--collisions-ok` is not set, some packages write to the same test folders and it will fail
RUN PEX_TOOLS=1 /usr/local/bin/python${PYTHON_MAJOR_VERSION} /airflow.pex venv --scope=deps --collisions-ok --compile /bin/app

FROM apache/airflow:slim-${AIRFLOW_VERSION}-python${PYTHON_MAJOR_VERSION} as srcs
ARG PYTHON_MAJOR_VERSION
COPY airflow/airflow.pex /airflow.pex
USER root
RUN PEX_TOOLS=1 /usr/local/bin/python${PYTHON_MAJOR_VERSION} /airflow.pex venv --scope=srcs --compile /bin/app

FROM apache/airflow:slim-${AIRFLOW_VERSION}-python${PYTHON_MAJOR_VERSION}
ARG PYTHON_MAJOR_VERSION

COPY --from=deps /bin/app /bin/app
COPY --from=srcs /bin/app /bin/app

# The dags code path is stable as we are using venv mode for the PEX binary
COPY --from=srcs /bin/app/lib/python${PYTHON_MAJOR_VERSION}/site-packages/dags /opt/airflow/dags

ENTRYPOINT ["/bin/app/pex"]
CMD [ "standalone"]
So the PEX is run in a different mode and I guess my issue is due to that circumstance: My code is just stuck when calling
django.setup()
without any further logs currently. Any idea why this runs locally without any issues but is a problem when deployed with docker/PEX?
I guess that the root cause is the following: • django.setup() intis the apps defined in my settings.py, which lists a few string imports including some of my django applications • When running it locally it can find those apps within my django folder as it is included in the sanbdox direcotry:
Copy code
drwxr-xr-x  18 vscode vscode   4096 Jul  4 12:16 django
• Now within the docker image this source code (the django apps) is included in
/bin/app/lib/python3.8/site-packages
. And I guess this is not the location where it is expected to be. Is there a way to tell PEX that in venv mode that the source code should not be located in the site-packages but in /bin/app? It seems like a workaround is to set PYTHONPATH=
/bin/app/lib/python3.8/site-packages
, which seems weird to be honest.
@enough-analyst-54434 Idea?
e
Well, clearly Airflow doesn't work with venvs. So maybe stop trying to use those. Modern Pex (https://github.com/pantsbuild/pex/releases/tag/v2.1.135 or newer) has a
pex3 venv
tool for creating venvs. Unlike
PEX_TOOLS=1
this tool is not embedded or embeddable in your PEX file; so you would need to temporarily install Pex in a venv in the docker container to get access to this tool - you could then remove it after use to slim the image. In particular, you can `pex3 venv create --layout flat`:
Copy code
--layout {venv,flat,flat-zipped}
                        The layout to create. By default, this is a standard venv layout including activation scripts and a hermetic `sys.path`. The flat and flat-zipped layouts can be selected when just the `sys.path` entries are
                        desired. Thiseffectively exports what would otherwise be the venv `site-packages` directory as a flat directory that can be joined to the `sys.path` of a compatibleinterpreter. These layouts are useful for
                        runtimes that supply an isolated Python runtime already like AWS Lambda. As a technical detail, these flat layouts emulate the result of `pip install --target` and include non `site-packages` installation
                        artifacts at the top level. The common example being a top-level `bin/` dir containing console scripts.
That effectively gives you site-packages at the location you specify, which could be
/opt/airflow/dags
.
I'm just going along with your investigative claims, but if those are correct, this inability to work in a venv is odd unless Airflow calls this out in its docs and makes clear you must do things in a very special way.
So, I have no clue about airflow; as such, if I were in your shoes, I would work backwards from the working thing. In otherwords, instead of just listing the structure of your working sandbox, I'd inspect the __run.sh script for the command and env vars Pants sets up. Clearly both the structure and the command + env vars are critical.
Then, once you have the critical bits figured out - make all that happen in the Docker image.
the
pex3 venv create
I introduced above may or may not help with that Docker image setup.