Hello! I have a question about excluding transitiv...
# general
s
Hello! I have a question about excluding transitive dependencies in targets using
!!//path/to:target
. I'm not sure if the behavior I'm seeing is a bug, or if I'm misunderstanding something. I'm running pants
2.21.0
There's an example
BUILD
and run steps in the thread I basically have a structure like in this graph. We use transitive dependencies to figure out which unit tests to run depending on changes in our code base. The test depends on a utility function, that in turn depends on two sets of other dependencies
A
and
B
.
A
and
B
are files of the same type, but consider
A
as test/mock data, and
B
as real data. I would like to exclude
B
, and transitively exclude
B.1
and
B.2
from the dependencies of
test
, so that changes to these do not trigger the test. It's still necessary that
utils
depend on these. Here's the issue: Let's say all files are in
src/ex
.
test
is a
python_test
,
A
and
B
are
target
, the rest are
python_source
I add
!!:B
to the dependencies of
test
I run
pants dependencies --transitive src/ex:test
and see that
B
is indeed not a dependency anymore. However,
B.1
and
B.2
are listed. But if I run
pants paths --from='src/ex:test' --to='src/ex:B.1'
it returns an empty list, showing that there's no path from
test
to
B.1
. Removing the
!!
line makes this command return the expected path via
utils
and
B
. it returns an empty list. Removing the
Here's an example BUILD file:
Copy code
target(
    name="A",
    dependencies=[
        ":A.1",
    ],
)
target(
    name="B",
    dependencies=[
        ":B.1",
        ":B.2",
    ],
)

python_source(
    name="A.1",
    source="A.1.py",
)
python_source(
    name="B.1",
    source="B.1.py",
)
python_source(
    name="B.2",
    source="B.2.py",
)

python_source(
    name="utils",
    source="utils.py",
    dependencies=[
        ":A",
        ":B",
    ],
)

python_test(
    name="tests",
    source="tests.py",
    dependencies=[
        ":utils",
        "!!:B",
    ],
)
Create empty files with
Copy code
mkdir -p src/ex
cd src/ex
touch tests.py utils.py A.1.py B.1.py B.2.py
cd -
Then you can run
Copy code
pants dependencies --format=json --transitive src/ex:tests
To add what my goal is - because these commands seem to output different results when it comes to transitive dependencies. This is the command we use to run our tests:
Copy code
pants --changed-since=HEAD~1 --changed-dependees=transitive test
I want this command to not trigger
tests
to run when
B.1.py
or
B.2.py
are changed. Only when
A.1.py
is changed.
Actually, explicitly setting
!!:B.1
and
!!:B.2
in
tests
dependencies
doesn't exclude them at all. If I do that, edit
B.1.py
, and then run
Copy code
pants --changed-since=HEAD~1  --changed-dependees=transitive list
I see both
tests
and
B.1
- indicating that the tests would run when changing one of the excluded dependencies. I'm leaning towards this being a bug. Nevertheless it's very confusing that
pants paths
,
pants --changed-dependees=transitive test/list
, and
pants dependencies --transitive
all produce different results from each other regarding transitive dependencies so I will probably look for another approach.
Bump, I would still like some support on this 🙂
s
your example looks overly complicated, you can show the same issue with only
tests
util
and
b
right? it will be much easier to understand
s
Sure, the same would hold if only
tests
,
utils
,
A
and
B
existed, with this graph:
Copy code
tests --> utils ----> A
             |------> B
Adding`!!:utils` in the dependencies list for
tests
would still have
A
and
B
show up as a transitive dependence. The first example is designed after my already existing dependency graph.
s
Does
!:utils
work for you?
s
Thanks for taking the time to respond 🙂 Setting it up again with the smaller example. Here's the BUILD file now, this is graphed in the screenshot. The files are located in
src/backend/ex
, and
src
and
src/backend
are configured as pants roots. There are empty
__init__.py
files in the source dirs.
test.py
imports a function from
utils.py
utils.py
imports functions from
A.py
and
B.py
A.py
imports a function from
C.py
No excludes in the dependencies for
tests
reports
Copy code
➜ ./pants dependencies --transitive --format=json src/backend/ex:tests
Checking Pants installation...
{
    "src/backend/ex:tests": [
        "src/backend/ex:A",
        "src/backend/ex:B",
        "src/backend/ex:C",
        "src/backend/ex:utils"
    ]
}
If I do
!:utils
or
!!:utils
Copy code
{
  "src/backend/ex:tests": []
}
If I do
!:A
I get
Copy code
{
  "src/backend/ex:tests": [
    "src/backend/ex:A",
    "src/backend/ex:B",
    "src/backend/ex:C",
    "src/backend/ex:utils"
  ]
}
If I do
!!:A
I get
Copy code
{
  "src/backend/ex:tests": [
    "src/backend/ex:B",
    "src/backend/ex:C",
    "src/backend/ex:utils"
  ]
}
In the last case, I also expect
:C
to be excluded from the dependencies, not only
:A
.
Excluding
:utils
completely is not preferable, I want to depend on utils but only part of its dependencies. But maybe this is straight up a poor design choice.
s
ok, it makes sense, but I think it's trickier to get right, imagine B also depends on C, then should you exclude C or not?
s
If I exclude
:A
, and
:B
depended on
:C
as well, then
:B
and
:C
should be included in the dependencies yeah. To be a bit more concrete about why we have this setup:
A
and
B
are configuration files (written in python) for specific use cases, and
utils
contains helper functions for loading these from disk.
A
can be seen as real data, and
B
as mock data, and we don't want to depend on the real data in our unit tests.
s
Yeah, what you describe is a much more complicated logic that involves the graph analysis. Pants implementation is much simpler, it's just a set difference:
Copy code
transitive_excludes = await Get(Targets, Addresses(transitive_exclude_addresses))

    return TransitiveTargets(
        tuple(dependency_mapping.roots_as_targets),
        FrozenOrderedSet(dependency_mapping.visited.difference(transitive_excludes)),
    )
first we fetch all dependencies, then fetch excluded targets and then just calculate the set difference
s
Ah, gotcha, this all makes sense. So it's intended behavior based on the implementation. I think maybe what I'm asking for would complicate quite a bit of the dependency inference ^^ The transitive excludes function don't seem to be very well documented, so it was a bit of trial and error. I only saw it mentioned in a callout box, and in github issues. Thanks for taking the time! I think we'll look into alternative approaches 🙂
s
you could probably implement a new feature like
~:B
to do the real subgraph excludes, I guess somewhere around here https://github.com/pantsbuild/pants/blob/fb1c6a381520d511c9c12b0598561fb7fe8afb4d/src/python/pants/engine/internals/graph.py#L762-L816
but yeah, it's probably easier to choose some different workaround, e.g. remove utils dependencies on :A and :B and instead add these dependencies to all users of utils like tests or sources
s
That's exactly the choice I went with in the meantime 🙂 I think the dependency structure we have is poorly formulated, so this might just provide the opportunity for a needed refactoring.
s
e.g you can split utils into
Copy code
def load_resource_a()
def load_resource_b()
def process_resource(r: A | B)
and move
load_resource_a
to A and
load_resource_b
to B
this way users can choose which resource to load themselves, this will make the api more pants friendly
s
Good suggestion, I'll keep this in mind 🙂 Thanks!