Hi all, I am having trouble running my Django appl...
# general
t
Hi all, I am having trouble running my Django application in a container using PEX. My manage.py looks like this:
Copy code
import os
import sys


def run_manage():
    os.environ.setdefault("DJANGO_SETTINGS_MODULE", "django_core.settings")
    args = sys.argv
    from django.core.management import execute_from_command_line

    execute_from_command_line(args)


if __name__ == "__main__":
    run_manage()
The PEX target like this:
Copy code
pex_binary(
    name="manage",
    entry_point="manage.py",
    restartable=True,
)
There is no issue running
./pants run django/manage.py -- runserver
. But when running the PEX file in this docker container:
Copy code
ARG PYTHON_VERSION
ARG VARIANT

FROM python:${PYTHON_VERSION}-${VARIANT}

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

COPY django/manage.pex /bin

EXPOSE 8000
CMD [ "/bin/manage.pex", "--", "runserver"]
I get this error:
Unknown command: '--' Type 'manage.py help' for usage.
. It should be noted that it seems the app gets initialized as it throws an error when an env is missing that is used in
settings.py.
When I remove "--" it just throws:
Copy code
Traceback (most recent call last):
  File "/root/.pex/unzipped_pexes/4385bc9a1b5d88169d1c8cca8cfb039ab5e16e21/manage.py", line 12, in <module>
    run_manage()
  File "/root/.pex/unzipped_pexes/4385bc9a1b5d88169d1c8cca8cfb039ab5e16e21/manage.py", line 7, in run_manage
    from django.core.management import execute_from_command_line
ModuleNotFoundError: No module named 'django'
When I check the pex I can see django in
.deps
:
Copy code
root@b11c2d409264:/bin/manage/.deps# ls -la | grep Django
drwxr-xr-x   5 root root  4096 Jan  1  1980 Django-3.2-py3-none-any.whl
What am I doing wrong?
m
Hi Chris, When I run
./pants package ::
my pex binaries end up under
dist/
, however are you then using pants to also build the
Dockerfile
? I took a slightly different approach as I wanted to also do migrations, etc. when starting the docker image, so I ended up with a wrapper script
start-server.sh
pointing to the PEX, which sets
PEX_MODULE
environment variable to
manage
(for the python import path to
manage.py
within the pex itself) and uses that to then pass arguments such as
runserver
in without the
--
.
NOTE:
$PEX_FILE
is the path to the pex within the docker image, so
/bin/manage.pex
in your case 😉
t
Yes I am using Pants to build the Docker image as well and it automatically infers the PEX.
m
Okay good stuff, how does the docker container behave after setting
ENV PEX_MODULE=manage
before removing the
--
in your Dockerfile
CMD
on the next line?
t
I will give it a try
What does your Dockerfile look like?
Interesting thank you! So you are always running migrate, collectstatic, ... before doing runserver?
m
Yep, mainly because the Django app I have needs to do a little setup (such as pulling assets from s3 depending on deployment environment) before it'll work. I can't remember why I do
makemigrations
, but I think it's probably to ensure that any external databases for the given environment are up to date.
Django Dockerfile.dockerfile
The above is a slightly cutdown version, but hopefully it gives you the idea
Note the odd
src.python.apps.django_backend/gunicorn_run.pex
dot path syntax - this was what worked as it is relative to the
dist/
directory of pants
(There was a little bit of trial and error to get things right IIRC!)
t
Copy code
ENV PEX_MODULE=manage
CMD [ "/bin/manage.pex", "runserver"]
Did not help, still
ModuleNotFoundError: No module named 'django'
The funny thing is
[ "/bin/manage.pex", "help"]
and
[ "/bin/manage.pex", "migrate"]
actually returns something. Could this be related to the fact that my django directory is called 'django'
e
@thousands-plumber-33255 that's a good guess, but I don't think so. Your OP shows:
Copy code
Traceback (most recent call last):
  File "/root/.pex/unzipped_pexes/4385bc9a1b5d88169d1c8cca8cfb039ab5e16e21/manage.py", line 12, in <module>
    run_manage()
That indicates the
django/
housing
django/manage.py
is stripped since
manage.py
is at the root of the unzipped PEX (they are namespaced by their hash). @thousands-plumber-33255 can you provide the output of?:
Copy code
unzip -qc dist/...your.pex PEX-INFO | jq .
If you don't have jq you can leave the trailing pipe clause off.
t
Here we go
e
@thousands-plumber-33255 thanks. I think you just need to add
execution_mode="venv"
to your
pex_binary
target. I'm trying to confirm on the side, but what is most likely happening is that one of the `django*`distributions in the PEX does not use namespace packages properly. Having PEX install itself in a venv fixes these sorts of issues. As an aside its also better for a host of reasons. Have you seen any of these?: + https://pex.readthedocs.io/en/v2.1.121/recipes.html#pex-app-in-a-container + https://blog.pantsbuild.org/optimizing-python-docker-deploys-using-pants/
t
It looks like this works!
e
I managed to build a PEX with your requirements list and there are no namespace package issues in the 3rdparty requirements:
Copy code
jsirois@Gill-Windows:~/support/pex/ChrisStetter $ zipinfo -1 test.pex | grep /django/ | head
.deps/Django-3.2-py3-none-any.whl/django/
.deps/Django-3.2-py3-none-any.whl/django/__init__.py
.deps/Django-3.2-py3-none-any.whl/django/__main__.py
.deps/Django-3.2-py3-none-any.whl/django/shortcuts.py
.deps/Django-3.2-py3-none-any.whl/django/urls/
.deps/Django-3.2-py3-none-any.whl/django/urls/__init__.py
.deps/Django-3.2-py3-none-any.whl/django/urls/base.py
.deps/Django-3.2-py3-none-any.whl/django/urls/conf.py
.deps/Django-3.2-py3-none-any.whl/django/urls/converters.py
.deps/Django-3.2-py3-none-any.whl/django/urls/exceptions.py
jsirois@Gill-Windows:~/support/pex/ChrisStetter $ zipinfo -1 test.pex | grep /django/ | tail
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/__init__.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/base.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/console.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/dummy.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/filebased.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/locmem.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/smtp.py
.deps/Django-3.2-py3-none-any.whl/django/bin/
.deps/Django-3.2-py3-none-any.whl/django/bin/django-admin.py
Your hunch may have been right. Can you do a similar grep on your PEX and confirm or deny `Django-3.2-py3-none-any.whl`is the only provider of the
django
directory?
t
Thank you so much. Yes I have seen those information, but wanted to postpone this to a later stage
e
If you can run the command above, it would be good to confirm the provenance of
django
in your PEX.
t
Should I use execution_mode="venv" only for the pex deployed with docker or is it safe to use with ./pants run as well?
Copy code
vscode ➜ /repo $ zipinfo -1 dist/django/manage.pex | grep /django/ | head
.deps/Django-3.2-py3-none-any.whl/django/
.deps/Django-3.2-py3-none-any.whl/django/__init__.py
.deps/Django-3.2-py3-none-any.whl/django/__main__.py
.deps/Django-3.2-py3-none-any.whl/django/shortcuts.py
.deps/Django-3.2-py3-none-any.whl/django/urls/
.deps/Django-3.2-py3-none-any.whl/django/urls/__init__.py
.deps/Django-3.2-py3-none-any.whl/django/urls/base.py
.deps/Django-3.2-py3-none-any.whl/django/urls/conf.py
.deps/Django-3.2-py3-none-any.whl/django/urls/converters.py
.deps/Django-3.2-py3-none-any.whl/django/urls/exceptions.py
e
Its safe to use everywhere. The only reason its is not the default in Pants is it has slower 1st run start up time - it takes extra time to build the venv on 1st run - its faster on every other run and more compatible in general.
t
And I can use debug-adapter with it as well?
e
Just a sec, can you run the command again with
tail
instead of
head
?
t
Copy code
vscode ➜ /repo $ zipinfo -1 dist/django/manage.pex | grep /django/ | tail
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/__init__.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/base.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/console.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/dummy.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/filebased.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/locmem.py
.deps/Django-3.2-py3-none-any.whl/django/core/mail/backends/smtp.py
.deps/Django-3.2-py3-none-any.whl/django/bin/
.deps/Django-3.2-py3-none-any.whl/django/bin/django-admin.py
e
Hrm, ok - so not a namespace package issue at all. Ok. You should be good to go then.
Yes - you should be able to use debug adapter.
t
Alright, thank you
e
In summary - PEXes (Python zipapps) are a bit odd, venvs on the other hand are totally normal. All tools work with venvs.
t
You seem familiar with PEX, can I also set a third party package as an entrypoin in the pex_binary target instead of a Python file? I have airflow installed and would like to directly run https://github.com/apache/airflow/blob/main/airflow/__main__.py
e
Yes. But the best way is to use a console script if there is one. For your case, there is. It's named
airflow
: https://github.com/apache/airflow/blob/main/setup.cfg#L166-L168 + Using pex that's
pex c airflow ...
where `-c`sets a console script entrypoint. + Using Pants that's: https://www.pantsbuild.org/docs/reference-pex_binary#codescriptcode + If you already have the built PEX file, you can also ad-hoc use
PEX_SCRIPT=ariflow my.pex ...
(see
pex --help-variables
or https://pex.readthedocs.io/en/v2.1.121/api/vars.html for more runtime env var control knobs)
t
Ahh, learned something again, thanks! But:
Copy code
pex_binary(
    name="airflow",
    script="airflow",
    restartable=True,
)
results in
Copy code
15:45:41.84 [ERROR] 1 Exception encountered:

Engine traceback:
  in `run` goal - environment:linux_devcontainer

ProcessExecutionFailure: Process 'Building airflow/airflow.pex' failed with exit code 1.
stdout:

stderr:
Could not find script 'airflow' in any distribution  within PEX!
Ohh wait
The dependency should be missing
e
And you proably want execution_mode="venv" - make that a habit unless you have a very good reason not to.
You might want to set this default in a BUILD at the ~top of your repo in fact. See: https://www.pantsbuild.org/docs/targets#field-default-values
👀 1
t
So dependency inference does not work now for which reason airflow is not included in the PEX. I have a folder called /dags where all the airflow source code lives (BUILD files with python_sources are present). How can I add this as a dependency to the pex binary target?
e
You use the
dependencies
(list of strings) field to list all dependencies that are not inferrable - thats for both 1st and 3rdparty dependencies. If the PEX if just 3rdparty - day airflow - something like:
Copy code
pex_binary(
  ...
  script="airflow",
  dependencies=["3rdaprty/python:reqs#airflow"]
)
If 1st party, something like:
Copy code
python_sources()  # This will "own" my_entry_point.py
pex_binary(
  ...
  entry_point="./my_entry_point.py"
)
Either way - you should just need to include a single explicit dependency in the
pex_binary
target and that dependency should include the entry_point / console script. Pants will infer the transitive closure of dependencies from that single entry point / console script dependency.
t
Ah nice, got it working with
dependencies=["airflow/dags"]
. So now those dependencies are included in the pex. But airflow is expecting the dags dir to be in the AIRFLOW_HOME dir. By default this is
~/airflow
and I can see that files are being created there when I run
./pants run airflow:airflow
. But of course the dags are not present in that home folder but in the pex. Later on I will deploy this in a Dockerfile, but I think the issue is the same: How can I handle this case? From the django example I have seen that the pex is executed in some unstable dir like
/root/.pex/unzipped_pexes/hash
so adding this is not stable.
e
So, in the Docker case you should be using the documentation you've been deferring. Both the blog and the Pex recipe direct you to install the PEX file in a venv as part of the Dockerfile COPY / RUN steps. This means the build is optimized, pycs precompiled, etc - but it also means you control where the venv directory path is. Since you control env vars too in a Dockerfile you can set all of this up just so.
For the run case ... that's trickier. I think you'd need a small bit of wrapper code in your main that set AIRFLOW_HOME before calling into airflow and it would do so by using
__file__
and calculating relative to that. Using
__file__
in this way will require venv mode so that all code - 1st and 3rdparty, shares the same (site-packages) root.
t
Both makes a lot of sense, thank you! Will give it a try 🙂
@enough-analyst-54434 Airflow is running in a venv and my
dags
dependencies (as specified in the BUILD file) are now in
/bin/app/lib/python3.8/site-packages/dags
. Is there any way to tell PEX that this dependency should be placed somewhere else? I could not find anything in the PEX or pants docu
e
There is not. You know they are at a fixed path though; so you could set
AIRFLOW_X
though couldn't you?
t
Just trying that 🙂
Is there a better way to just include all python files as most of them cannot be inferred? All of them have a python_sources target
Copy code
pex_binary(
    name="airflow",
    # <https://github.com/apache/airflow/blob/main/setup.cfg#L166-L168>
    script="airflow",
    restartable=True,
    layout = "packed",
    execution_mode="venv",
    include_tools = True,
    dependencies=[":dags"],
)

docker_image(name="docker")

resources(
    name="dags",
    sources=[
        "./dags/**/*.py",
    ],
)
This does not seem like a good idea as the now included files do not infer its 3rd party dependencies. So is there any way to just add all python_sources targets in the dags folder as a dependency to the pex binary?
e
Why
resources
and not
python_sources
?
The latter gets you dep inference.
t
Makes sense. But its not possible to define target types as a dependency but only target addresses? Because what I get know is a lot of warnings as two python sources (my explicit one and the auto-generated ones) own the target:
Copy code
07:00:23.77 [WARN] The target airflow/dags/spatial/planninRegionDataInitial/src/planning_regions_initializing.py:../../../../dags imports `dags.base.dagOperatorSessionContext.session_scope`, but Pants cannot safely infer a dependency because more than one target owns this module, so it is ambiguous which to use: ['airflow/dags/base/dagOperatorSessionContext.py', 'airflow/dags/base/dagOperatorSessionContext.py:../../dags'].
Of course I could add all python target adresses manually but that would be quite error prone when adding new files.
Another issue I have is that pants infers dependencies from another target in the monorepo which has the same directory name. When I run
./pants dependencies --transitive airflow:airflow
I can see
django/spatial/**
files being included and I suspect that it comes from the fact that I also have a
airflow/dags/spatial
directory. Because nothing in the dags directory is importing from the django code. How can this be resolved?
e
my explicit one and the auto-generated ones
Can you turn off the auto-generated ones? I assume you mean
pants tailor
here? It has knobs to ignore targets and subtrees: See the 2 ignore options starting here: https://www.pantsbuild.org/docs/reference-tailor#ignore_paths
Because nothing in the dags directory is importing from the django code. How can this be resolved?
I think this work https://github.com/pantsbuild/pants/pull/17931 has a good chance of solving. That's available in 2.16.0.dev5+: https://pypi.org/project/pantsbuild.pants/2.16.0.dev6/
t
Yes I mean
pants tailor
. Ignoring the
python_sources
targets here would result in a long list that would be hard to maintain. The pants option seems to scale better, but that in turn would never generate other targets like
python_tests
right?
Thanks for pointing to this PR. So there is no option in 2.15? I think I could ignore django in the dependencies field, but I assume I would need to add all target adresses as it does not allow a wildcard right?
e
Ignoring the
python_sources
targets here would result in a long list that would be hard to maintain. The pants option seems to scale better, but that in turn would never generate other targets like
python_tests
right?
I'm not following you.
Copy code
[tailor]
ignore_paths = ["airflow/dags/**"]
In combination with a single
python_sources
target in
airflow/dags/BUILD
that globs all python source files under
airflow/dags
should allow you to: 1. Have a single 0-maintenance
airflow/dags
target for other things to depend on. 2. Let dependency inference (1st and 3rparty) just work. 3. Allow you to use
./pants tailor
on all other portions of the repo as per normal.
t
I can totally follow. What I was trying to say is if I follow this approach: My tests are living side by side to the source code, e.g. in
airflow/dags/dir1/example_test.py
. With this structure I cannot utilize
pants tailor
for generating
pyhon_tests
targets in the future since I cannot ignore generating only specific targets right?
I tried
2.16.0.dev6
, but with
ambiguity_resolution = "by_source_root"
I am still seeing my django files being included 👀 Any idea? Does that really apply here as it is
django/spatial/**
with
airflow/dags/spatial
and not
django/dags/spatial/**
. @happy-kitchen-89482 Any thoughts here?
e
for generating
pyhon_tests
targets in the future since I cannot ignore generating only specific targets right?
You can do the same for
python_tests
- use 1 python_tests target that recursively globs in the
airflow/dags
tree. So, again, tailor still works for all other parts of your repo. It does not work in
airflow/dags
and you make that maintainable by using the strategy of declaring solitary targets in `airflow/dags/BUILD`that use recursive globs. Does that make sense or am I missing why that does not work?
1
t
Thank you!
Is this incorrect?
Copy code
python_sources(
    name="dags",
    sources=[
        "dags/**/*.py",
        "dags/**/*.pyi",
        "!dags/**/test_*.py",
        "!dags/**/*_test.py",
        "!dags/**/tests.py",
        "!dags/**/conftest.py",
        "!dags/**/test_*.pyi",
        "!dags/**/*_test.pyi",
        "!dags/**/tests.pyi",
    ],
)
It warns as follows:
./pants --no-pantsd dependencies --transitive airflow:airflow
`105625.16 [WARN] Unmatched glob from airflow:dags's
sources
field: "airflow/dags/**/*.pyi", excludes: ["airflow/dags/**/*_test.py", "airflow/dags/**/*_test.pyi", "airflow/dags/**/conftest.py", "airflow/dags/**/test_*.py", "airflow/dags/**/test_*.pyi", "airflow/dags/**/tests.py", "airflow/dags/**/tests.pyi"]`
e
Unmatched globs means what it says, so remove the globs you don't have representative files for yet (let's say it's .pyi). Once you add those, add the glob. Yes you will find this annoying since you want to set and forget now, but in a few days time when the subtree has filled out with code, you'll arrive back at this full glob set and then be done.
The best you can do here is crank the nag to an error, you cannot turn both off: https://www.pantsbuild.org/docs/reference-global#unmatched_build_file_globs
t
For the run case ... that's trickier. I think you'd need a small bit of wrapper code in your main that set AIRFLOW_HOME before calling into airflow and it would do so by using
__file__
and calculating relative to that. Using
__file__
in this way will require venv mode so that all code - 1st and 3rdparty, shares the same (site-packages) root.
What are the advantages/disadvantages for this case between the three options I see here? 1. Run the Pex file 2. Run the main file directly 3. Run the pex file in the same docker container I use to deploy Airflow using
docker_environment
A few thoughts where I am currently having issues: 1. Even though the environment has been added to the pex_binary target, I am not seeing the envs set thourgh
subprocess_environment_env_vars
. Is this not intended? 2. I do that already with the Django app and I it uses the defined envs from the environment. The main file calls https://github.com/apache/airflow/blob/main/airflow/cli/commands/standalone_command.py#L287 and that cannot be found. I can resolve this by adding this to main. Can this be set by pants?
Copy code
AIRFLOW_BIN_PATH = f"{os.environ['VIRTUAL_ENV']}/bin"
os.environ['PATH'] += ':' + AIRFLOW_BIN_PATH
3.Seems cumbersome, for example due to debug-adapter issues.
And what do you mean with setting AIRFLOW_HOME "relative" to
__file__
? Because something like this does not load my DAGS:
Copy code
AIRFLOW_HOME = str(pathlib.Path(__file__).parent.resolve())
os.environ.setdefault("AIRFLOW_HOME", AIRFLOW_HOME)
That results in something like
/tmp/pants-sandbox-nJNNNS/airflow
in which the
dag
folder is present.
e
That's what I mean.
So you'd need to debug. Seems like a strategy that should work, clearly issues.
I have really no clue about airflow and knowledge of that and how it works in some detail seems required at this point.
t
Just wanted to know if I am on the right path here
I am just really curious on why one would run the python source directly vs PEX?
e
This is such a big thread I've lost track. From my phone though, it may be easier to re page all this in when I get back to a keyboard.
👍 1
So, for 2 above,
execution_mode="venv"
on a
pex_binary
enables Pex's
--venv prepend
runtime execution mode, which says: 1. create a venv 1st if not already done and re-exec into that 2. Prepend the venv bin dir to the PATH
So, that should work already in modern versions of
./pants run
- you may need 2.15.x though, run has morphed its impl a bit in the past several months.
I am just really curious on why one would run the python source directly vs PEX?
This is all a bit eye of the beholder convenience if I understand your question. It may be considered convenient to be able to
./pants run tab/complete/to/file.py
vs remembering a target name of a
pex_binary
. The down side is you're not obviously testing production, which presumably is built from the PEX. Depending on how you feel about testing proxies for production vs ~actual production, you may have different feelings.
t
Thanks for the explanation! Is it intended that the pex binary does not pick ob the envs of the specified environment?
e
That is one of your sub-questions I'm not qualified to answer. It sounds like a bug to me, but I did not work on the environments project either.
This thread is a bit gnarly, you might want to break out new fresh threads for any remaining questions you have with nice summaries scoped to those threads so new people can grok the context all at once locally.
t
Will do 🙂