hey all, i’m using a `pyproject.toml` and `poetry_...
# general
c
hey all, i’m using a
pyproject.toml
and
poetry_requirements
to define requirements for my pants project. I’m finding that everytime I change/upgrade a dependency (even a dev one such as
pytest
) in
pyrpoject.toml
pants tries to rebuild all my targets (even pex_binaries which don’t depend on it). Any idea why this could be happening? From what I can gather it’s because pyproject.toml ends up being a transitive dependency of every target in the repo
1
e
What version of Pants are you using @cold-sugar-54376? This should be fixed in 2.11 and fixed with some work on your part in 2.10: https://www.pantsbuild.org/v2.10/docs/python-third-party-dependencies#first-turn-off-old-style-macros. If you're on Pants 2.9 or older you'll need to upgrade.
c
i’m using 2.11
this is what my BUILD file looks like in `3rdparty`:
Copy code
python_requirement(
    name="setuptools",
    requirements=["setuptools"],
)

poetry_requirements(
    module_mapping={
        "django-admin-sortable2": ["adminsortable2"],
        "django-admin-sso": ["admin_sso"],
        "django-anymail": ["anymail"],
        "django-environ": ["environ"],
        "django-import-export": ["import_export"],
        "django-inline-actions": ["inline_actions"],
        "django-ipware": ["ipware"],
        "django-safedelete": ["safedelete"],
        "drf-nested-routers": ["rest_framework_nested"],
        "google-api-python-client": ["googleapiclient"],
        "ShopifyAPI": ["shopify"],
    },
)
here’s a simple example:
Copy code
./pants dependencies --transitive src/util/service.py
3rdparty#gunicorn
3rdparty/pyproject.toml
src/util/__init__.py
e
Ah, right. 2.10 / 2.11 actually fixed things to get the behavior you see now. Previously you might change requirements.txt or pyproject.toml and Pants wouldn't always invalidate requirements. The surprising thing here - to me - is that after generating requirements from the changed pyproject.toml, Pants doesn't short circuit shortly thereafter when it sees none of the required requirements have changed for a particular target - like the pex_binary targets you mention.
It may be that Pants is skipping all skippable work though and just outputting PEXes? Since PEXes are written out under dist/, and thus out of Pants control at that point (you could delete them or mutate them), Pants currently doesn't try to be clever and it will always perform that final PEX write step under dist/ even if nothing else has changed.
c
yeah, i was just kind of hoping that if I did something like bump the
pytest
version, pants would be smart enough to know if I did
./pants --changed-since=asdf --changed-dependees=transitive filter --target-type=docker_image
that i don’t actually need to rebuild the docker image
it sounds like that’s not the case though… for now i’ve just written some hacky scripts to actually determine which dependencies changed based on changes to pyproject.toml
e
Ok. To be clear though, Pants should only be re-building "external" things like docker images, PEX files, etc. And that's for the reason I mentioned. We may be able to get smarter though for some cases, particularly when they're expensive.
So, for these cases it's really not to do with dependencies changing or not, at least in the PEX package case, I'm less sure about the docker image case, it's just to do with the fact when you say
package
we always run that last side-effecting step where we create the package. All or most of the internal steps needed to prepare that final side-effect should be skipped / read from cache.
c
yeah that makes sense
but i only do a
package
based on a
filter
For example, if all I do is bump pytest and then run
./pants --changed-since HEAD --changed-dependees=direct list
I get a list of every
3rdparty#<dep>
target instead of just
3rdparty#pytest
ideally i could change that into a
transitive
and pipe that into a filter for docker images to see what actually needs to be built. I took it from this blog post: https://blog.pantsbuild.org/pants-pex-and-docker/
Copy code
./pants --changed-since=main --changed-dependees=transitive filter \
  --target-type=docker_image | \
  xargs ./pants package
so it sounds like pants isn’t smart enough to determine which dependencies have changed?
only that the entire file has changed? seems kind of weird
e
Well, it is, but not in the metadata commands like
list
. At runtime, when executing rules and processes to achieve goals, Pants looks at the inputs to the rule or process and if they have not changed, it short circuits at that point.
When you issue a metadata command, none of that short-circuiting happends, you just get a list of targets or files or deps that has not run through any rules yet.
Let me ask what happens when you instead just
./pants --changed-since=main --changed-dependees=transitive package
?
Does that take longer than you'd expect / hope?
c
it still tries to build all the pex binaries in addition to the docker image
which I suppose is fine, it’s pretty fast
on the order of 10-30 seconds i think
e
Ok. If its fast, then you're experiencing what I was describing. Pants will always perform the side-effecting final step that outputs a file or calls docker.
c
i’m actually thinking further down the CI/CD pipeline. for example, if I have 5 different services and update a dependency that only 1 of them depends on, it’d be great if pants could say “oh ok only this docker image needs to be built therefore only this service needs to be deployed”
but maybe that’s not really part of the design goals of pants
and to be clear, it sounds like this would work in any case except updating dependencies
i.e. just updating normal python sources
e
Hrm, 10-30 seconds is not amazing. It's not terrible, but that's not great. For the 10-30 seconds case do you see:
Copy code
11:44:37.60 [INFO] Initializing scheduler...
11:44:38.00 [INFO] Scheduler initialized.
As the 1st 2 Pants output lines or are you using --no-pantsd?
c
i think its cached now so hard to tell
let me try again
e
pants could say “oh ok only this docker image needs to be built ...
That much we should be getting right currently mod speed. The docker image might "rebuild" but as a noop.
c
ok yeah its actually quite fast if I re-run it
i’m not sure what was cached and what isn’t
e
I.E: We'll call out to docker as the last side-effecting step, but docker will see it has the image cached. I guess though, that that relies on the local Docker cache and if this is running in CI, or you've never run a package against that docker target but your firend has, you lose.
c
yeah
and would pants be able to tell me it was a no op programatically?
so in CI i could then decide to move forward or not
e
Currently no.
c
cool, no worries
again, it took me like 30 minutes to whip up some code that compares two pyproject files to determines which dependencies have actually changed and use that
e
I think we'd need to add an option to our side-effect ~subsystem to say, don't do it, just print out if you would do it.
c
right
e
This is actually quite hard. In order to know if the side-effect would be a noop, Pants needs to know if it ever before executed the actions leading up to the side effect on any machine. In other words, there would need to be a central registry which Pants would have to perform a side-effect on. Later, Pants would need to check the central registry to see if any Pants invocation from any machine had previously performed the side-effect. To provide exactly-once semantics here would be ~impossible. You could reasonably provide at least once semantics though.
c
yeah
is there a way we could add an option further down into something like
list
or
filter
to have it handle dependency file updates in the way i described?
e
So that's definitely out of Pant's general baileywick right now. You or a service provider like Toolchain could erect such a registry and a corresponding Pants plugin, but some work and thought needed there.
is there a way we could add an option further down into something like
list
or
filter
to have it handle dependency file updates in the way i described?
Hrm, I'm not sure how bad that would be. This is definitely not awesome:
Copy code
$ git diff
diff --git a/3rdparty/python/requirements.txt b/3rdparty/python/requirements.txt
index 551931f1e..0e7e39f0b 100644
--- a/3rdparty/python/requirements.txt
+++ b/3rdparty/python/requirements.txt
@@ -4,6 +4,8 @@
 # Additionally, it increases the surface area of Pants's supply chain for security.
 # Consider pinging us on Slack if you're thinking a new dependency might be needed.

+cowsay
+
 ansicolors==1.1.8
 chevron==0.14.0  # Should only be used by build-support.
 fasteners==0.16.3
$ ./pants --changed-since=HEAD --changed-dependees=transitive filter --target-type=pex_binary
build-support/bin#_generate_all_lockfiles_helper.py
build-support/bin#_release_helper.py
build-support/bin#changelog.py
build-support/bin#generate_docs.py
build-support/bin#generate_github_workflows.py
build-support/bin#generate_user_list.py
build-support/bin#reversion.py
src/python/pants/bin:pants
testprojects/src/python/hello/main:main
testprojects/src/python/native:main
I added a new dep no code uses at all, and get back all
pex_binary
targets changed, which, as you point out, is totally untrue.
c
Yeah
You could obviously get around it by defining a separate python requirement target for each one instead of using the file but that seems to defeat the purpose :)
I filed a bug in GitHub anyways, not sure if it's appropriate. I'm also surprised no one else has run into this before
e
Ok, thanks for filing.
c