Hey there, I have a (maybe naive) question about c...
# general
f
Hey there, I have a (maybe naive) question about caching. I have a project with quite a few external dependencies. When I do a
pants run
on the target runnable pex, every time I touch any code on my repo (which is in itself quite light), Pants starts rebuilding all the requirements like this
Copy code
Building 26 requirements for my_app.pex from the 3rdparty/python/default.lock resolve: Jinja2<4.0,>=3.0.0, PyYAML>=6.0.1
which (even with
pantsd
enabled) takes well over 30 seconds every time, even for very minor code changes on my side. I've checked the common issues and global options, but I feel I'm missing something here.
The
pants.toml
(slightly cleaned up, there's also some
black
and
flake8
options around there, plus the sources config) is basically
Copy code
[GLOBAL]
pants_version = "2.18.3"
backend_packages = [
  "pants.backend.python",
]
local_cache = true
pantsd = true

[source]
...

[python]
enable_resolves = true
default_resolve = "default"
interpreter_constraints = ["==3.11.*"]

[python.resolves]
default = "3rdparty/python/default.lock"
tools-external = "3rdparty/python/tools-external.lock"

[python-repos]
indexes = ["<https://some.internal.repo/repository/pip-cache/simple>"]

[setuptools]
install_from_resolve = "tools-external"
requirements = ["//tools/external:requirements"]
w
Few questions: • Is there anything in the logs for
.pants.d
folder that might point at a culprit? • Does this happen when you just run
pants package
on the pex? • Does
-ldebug
give any potential insight as to why it's packaging requirements? • Are you using or modifying any of the ignore items? https://www.pantsbuild.org/2.20/reference/global-options#pants_ignore https://www.pantsbuild.org/2.20/reference/global-options#pants_ignore_use_gitignore
f
any of the ignore items
🤦
using gitignore, set to ignore everything like
.*/
which includes
.pants.d
thanks for the quick answer 😅
So, updates • Does this happen when you just run
pants package
on the pex? -> Yeah, same behaviour • It actually still happens even if I set
pants_ignore_use_gitignore = false
. Every time I add a new character, it goes on "building 26 requirements". • Added
pants -ldebug
but I don't get much other info 😕 • The logs in
.pants.d
show me the same stuff,
Copy code
16:17:59.15 [33m[WARN][0m /home/.../specifiers.py:255: DeprecationWarning: Creating a LegacyVersion has been deprecated and will be removed in the next major release
  warnings.warn(

16:18:21.90 [INFO] Completed: Building 26 requirements for some_app.pex from the 3rdparty/python/default.lock resolve: Jinja2<4.0,>=3.0.0, PyYAML>=6.0.1, beautifulsoup4<5.0,>=4.12.2... (498 characters truncated)
16:18:21.98 [INFO] Wrote dist/some_app.pex
w
This is strange, is there a minimal reproducible example you could post? Also, does this happen on pants 2.20 (for example)?
I may have misread this original post. Does this happen on consecutive runs of
pants package
? Or is it ONLY after some code has been manipulated?
On the machine I first replied on, the “I touch any code” was cut-off - leading me to think this happened every time you ran the same command, regardless of whether you touched code 🤦‍♂️
So, what you're currently seeing isn't unusual (at least, not to me) - where modifying the targets that are packaged up causes a re-build. I have that happen too on some of my projects. Some of the ways I've tried to mitigate this when it annoys me are splitting up my dependencies to be a bit finer and easier to cache independently edit: Struggling to find the project where I did it, but I recall doing something like this: https://www.pantsbuild.org/blog/2022/08/02/optimizing-python-docker-deploys-using-pants#multi-stage-build-leveraging-2-pexs I don't know if that will work for you, but it just so happened to work in my circumstance because of how everything else needed to work. Essentially, I built a requirements pex and an application pex - and merged the two in some way to build the binary I needed
d
Is it the same when you run
pants run path/to/your_main_file.py
?
f
@wide-midnight-78598
Or is it ONLY after some code has been manipulated?
Yeah, exactly, only after manipulating code. I see that multistage build trick but it seems very targeted for the building of Dockerfiles, I'd have to see if I can adapt it. @dry-architect-80370
Is it the same when you run
pants run path/to/your_main_file.py
?
Actually no! Then it's fine. However, the PEX uses a custom entrypoint to launch a FastAPI server via gunicorn. Not sure there'd be an easy way to reproduce this with
pants run <file>
.
d
Interesting.. What do you put in the dependencies field in the pex target?
f
Copy code
pex_binary(
    name="some_api_pex",
    script="gunicorn",
    args=[
        "<http://some_api.app:create()|some_api.app:create()>",
        "--worker-class",
        "uvicorn.workers.UvicornWorker",
    ],
    dependencies=[
        ":some_api_src",
        "//:requirements#gunicorn",
        "//:requirements#uvicorn",
    ],
    output_path="some_api.pex",
)
where the
:some_api_src
are just the python sources
w
Okay, so you’re using this something like I am - with the pex holding unicorn and gunicorn. I literally just discarded the multi-step pex example last night. Let me try to do this with one of my simpler repos this morning. While the example works for docker specifically, that’s more how the pex files are reconciled in the end. I did something similar with
scie
packages.
d
You could try changing
:some_api_src
dep to just
<http://some_api.app|some_api.app>
(not sure what's the correct syntax to point it to a single file, but it's def doable), otherwise I have no other ideas
w
@fancy-policeman-6755 Would there be any benefit in structuring it something like this? All your deps in one pex, sources in another and then a top-level pex that takes in both? My usecase in the end does something like this - so your mileage may vary. I realized it's hard to really make a good example on some of my trivial projects, because my computer is so fast, that changing sources is a sub-1 second re-build anyways 🤦‍♂️
PEX_TOOLS=1 python3.11 hellofastapi-pex.pex venv --bin-path prepend --compile --rm all ./venv
./venv/bin/uvicorn hellofastapi.main:app
Copy code
python_sources(
    name="libhellofastapi",
    sources=["**/*.py"],
)

pex_binary(
    name="hellofastapi-deps",
    include_sources=False,
    include_tools=True,
    dependencies=["//:reqs#uvicorn", "//:reqs#fastapi", "//:reqs#numpy", "//:reqs#pywebview", "//:reqs#gunicorn"]
)

pex_binary(
    name="hellofastapi-srcs",
    include_requirements=False,
    include_tools=True,
    dependencies=[":libhellofastapi"]
)

pex_binary(
    name="hellofastapi-pex",
    dependencies=[":hellofastapi-srcs", ":hellofastapi-deps"],
    include_tools=True,
)
f
Hey there! So the problem is that I don't think
uvicorn
and
gunicorn
are the heaviest dependencies in there (there's also fastapi, dash, and a bunch of other heavy things). If I understand right, I'd have to explicitly define them in the
hellofastapi-deps
pex, which defeats a bit the point of
pants
autodiscovery.
However, I've been thinking about what @dry-architect-80370 says about just running a source file directly. I guess the key point is that I don't really need to run the pex itself to test incremental changes. And it's building the pex (so I guess zipping the deps) that causes the slowdown. So what I did is just add a
Copy code
if __name__ == "__main__":
  import uvicorn
  uvicorn.run(create_app(), debug=True, ...)
in the module that creates the FastAPI app that is called in the pex entrypoint by gunicorn. Then I can just run directly the module, without having to build the whole pex. It's not exactly the same as in the production env, but for testing minor code change it's enough.