Background: I have several `python_distribution` t...
# general
p
Background: I have several
python_distribution
targets for building wheels destined for PyPI. Several of them contain one (or a few) primary entry point scripts, so those dependencies get inferred cleanly from
entry_points
and
sccripts
. One of them is a library (
st2common
) used by the rest. Though it has some utility entry points, for the most part, it contains code that is shared by the entry points in the other packages. Is there a way to avoid explicit dependencies from the
python_distribution
to the the
python_sources
in the library? In particulars, based on dependency inference, if one of the other first-party packages depends on something in the library's
python_distribution
, can that library get also depend on that code, so that it is available within the packaged wheel? In other words, I want to avoid adding explicit dependencies to satisfy many errors like this:
Copy code
NoOwnerError: No python_distribution target found to own st2common/st2common/services/access.py. Note that the owner must be in or above the owned target's directory, and must depend on it (directly or indirectly). See <https://www.pantsbuild.org/v2.15/docs/python-distributions> for how python_sources targets are mapped to distributions. See <https://www.pantsbuild.org/v2.15/docs/python-distributions>.
Adding 37 more explicit dependencies quiets the error for this
python_distribution
. But, that does not feel sustainable to expect people to manually register stuff whenever they add something to
st2common
.
h
Several of them contain one (or a few) primary entry point scripts, so those dependencies get inferred cleanly from entry_points and scripts.
Which are "those dependencies"?
I'm assuming you're using Pants-generated setup.py. If so, then Pants uses the source-level deps to generate dist-level deps.
đź‘Ť 1
Based on that error message, the problem seems to be that Pants can't figure out which dist st2common/st2common/services/access.py will be published in, for one of the reasons mentioned
Pants takes the sources and partitions them into dists based on this algorithm, and then induces dist-level deps from the source-level ones
So I'm wondering if the issue is simply that this isn't working for some reason, or that this cannot work with your setup for some reason
p
Yes. I'm using pants-generated setup.py (aside: though I look forward to replacing that with a generated pyproject.toml whenever we implement that). đź‘€ re-reading that section of the docs... one sec
`python_distribution`:
D
,
D'
`python_source`:
S
,
S'
(
D
is an “application” that uses the
D'
“library”) D and S are under path
d/
D’ and S’ are under path
d_prime/
D
defines an entry point using
S
, so it gets an inferred dependency on
S
S
imports
S'
so it gets an inferred dependency on
S'
D'
and
S'
are co-located But
D'
does not have any entry points and does not have an explicit dependency on
S'
, thus we get a NoOwnerError. What I would like:
D'
needs to infer a dependency on
S'
because
D
needs
S'
, and
D'
is the closest python_distribution that could own it.
@happy-kitchen-89482 Does what I’m describing make sense?
h
Ah, makes sense. So,
S -> S'
is inferred, and therefore
D -> D'
would be induced, but the problem is that Pants doesn't know about
D' -> S'
p
yes
h
I can't see how that inference you're proposing would work though
p
yeah. I’m not sure how to implement it.
h
D'
needs to infer a dependency on
S'
because
D
needs
S'
There is no relationship between
D'
and
D
at this point
And even if there were, why would we choose
D'
and not something else?
p
I’m preparing the PR to add all of those dependencies in the
st2
repo, and I expect I’ll get questions about why we have to manually maintain such a huge list of dependencies in such a central library. I have a layout: Da/a Db/b Dc/c D’/prime where
D*/BUILD
includes a
python_distribution
. If anything in each of those directories needs to be packaged, then it goes in the parent package. So, I do not have any hierarchical `python_distribution`s defined. ie, I’m not doing this (which would be confusing): Da/some/path/Dd/d where Da has a python_distribution, and Dd has another python_distribution. Logically, D* should have a “dependency” on anything in that directory. In
setup.py
that’s like using:
Copy code
from setuptools import setup, find_packages
setup(packages=find_packages(exclude=["tests"]))
So, I guess I’m looking for a replacement for
find_packages
to minimize manual maintenance.
Another way to look at it: There are no
python_distribution
targets defined in any parent directory of
D'
.
D'
does have a
python_distribution
defined. We want something that packages
S'
. So,
D'
is the only candidate that could package it.
h
OK, but that has nothing to do with
S
needing code from
S'
AFAICT
p
@happy-kitchen-89482, what about a special dependency field on the `python_distribution`: • depend on everything in these directories, unless it is unused by anything else. Thus, anything that is "dead code" (nothing else in the repo depends on it) just automatically drops out of the distribution. Except I still don't see how to implement that. Hmm.
Here is a command that shows me everything that is used by other distributions:
Copy code
./pants list --filter-target-type=python_distribution --filter-address-regex=-st2common:st2common :: | xargs ./pants dependencies --transitive | grep st2common/st2common
Now to figure out how to codify that.
h
Yeah, that makes more sense I think
some way of saying the equivalent of
find_packages
p
find_packages
blindly includes everything in the given directories. pants use of inferred dependencies effectively prunes the distribution down to only files that are actually in-use, effectively dropping "dead code". So, this zsh command (
=()
substitutes the path to a temp file and saves the process output to that temp file) gives me the list of dependencies to add to the
st2common:st2common
`python_distribution`:
Copy code
comm -13 =(./pants dependencies --transitive st2common:st2common | grep st2common/st2common | sort) =(./pants list --filter-target-type=python_distribution --filter-address-regex=-st2common:st2common :: | xargs ./pants dependencies --transitive | grep st2common/st2common | sort)
Once I add those as explicit deps, then this command shows me the dead python code that is not depended on by anything:
Copy code
comm -13 =(./pants dependencies --transitive st2common:st2common | grep -v -e : -e __init__.py | grep st2common/st2common | sort) =(find st2common/st2common -name '*.py' -and -not -name '__init__.py' | sort)
h
But how would you know what files are in-use?