Filtering dependencies of package targets seems to...
# development
p
Filtering dependencies of package targets seems to come up regularly (
python_distribution
,
pex_binary
,
debian_package
, and now the nfpm package types I'm working on). In many cases, these packages depend on another package, we want to include the package/archive itself, not the sources packaged in the target or owned by the target. For
python_distribution
there's quite a bit of logic to determine which distribution "owns" a given python source file. Some package targets (like
debian_package
) use a SpecialCasedDependency field to exclude their deps from normal dep calculation. So, what if we allow passing a predicate (lambda function) when requesting transitive deps that says whether or not to follow the dependencies of a given target. Maybe something like this (probably with the predicate on
TransitiveTargetsRequest
instead of on the
_DependencyMappingRequest
)? https://github.com/cognifloyd/pants/commit/b4044e76685041cdac322a3e96f4858b4634f045 (For what it's worth, mypy is happy with this stub commit.) Do any other request types use a lambda like this? Is that going to mess up graph calculation some how? I tried to minimize the effect on the current predicate-less code path, but I'm not sure about the performance implications of messing with the dep graph here. Should I keep going with this approach?
@witty-crayon-22786 @happy-kitchen-89482 do you have thoughts on this?
h
šŸ‘€ 1
In the specific case you mention though, I think it's a bit simpler than what's discussed in that issue.
AFAICT we ~never want to traverse deps through "package targets" like
pex_binary
? So the graph traversal itself should understand that concept?
(never - in normal use, maybe we do when the user manually queries the graph with the
paths
goal or similar)
So maybe a predicate is overkill (although it can still exist as an implementation choice)
There are really just two predicates: "always follow" or "don't follow the deps of packageable targets"
I think
Supporting arbitrary ones might get very confusing, unless we know we need to
Right now a target type doesn't know that it is packageable, but we can change that
p
Yeah. That issue is much broader in "scope" (couldn't resist the pun).
How would the graph know that a target is a package target? I don't think we have anything that is constantly used for all packagable targets.
Do all of them have an
OutputPathField
? Nope. Every package type implements its own
PackageFieldSet
.
Hmm. Instead of a predicate, the consumer could pass in a
FieldSet
class, and then use its
.is_applicable
classmethod as the predicate. Then, if that returns True, skip that target's dependencies. Or, I could also just grab all
union_membership.get(PackageFieldSet)
and do that for all of them.
b
So Instead of doing this for a specific scenario, I think we should just bite the bullet on the "tagged dependencies" idea.
That solves a whole host of issues, this included (AFAICT)
p
@happy-kitchen-89482 doing this for all package types would mean importing
PackageFieldSet
into the engine internals which is a little gross. :/ Moving it is an option as well.
@bitter-ability-32190 how does tagged deps help with not traversing the dependencies of package targets? šŸ¤” It does help with that "scope" issue. But tagging the deps in this case seems like more boilerplate.
b
Tagged dependencies gives Pants more a clue to how a dependency is meant to be consumed (and therefore should be provided by Pants). In this case, we don't know whether the dev wants the target as a loose set of files (like say, I wanted to lint the files that go into a
pex_binary
) or as the packaged thing (I intend to upload this
artifact
using my super special script). Tagging allows that vital information to be known to Pants, therefore we make less assumptions (exactly 0 assumptions šŸ˜›)
We can likely switch our current assumption from "you wanted all the inputs flat on disk, right?" to "you wanted the packaged artifact, right?" and mayyyybe get away with it, but it'd be great if we just stopped assuming
h
I'm suggesting that we would add some way of knowing this
Via the target type
I don't think tagged dependencies is the way to go here. In this case "do I follow this dep" is not a property of the dep. It's a property of where I started the traversal from...
b
The whole point of our lengthy discussions earlier was that targets shouldn't/can't assume how they will be consumed. We shift that knowledge onto the consumer
w
imo, step one is a consuming predicate/API that allows for controlling which dependencies are traversed. step two is adding more information for that API to consume.
ā˜ļø 1
b
That............ damn. I've been Stu'd. Yeah that makes complete sense.
p
Or you're thinking of a new request/response type that allows for filtering?
w
@proud-dentist-22844: right. and sorry for not reading the whole thread when i got the ping =x
a common consideration with these APIs is ā€œnodeā€ vs ā€œedgeā€ filtering… the predicate you have is doing node filtering
p
Whereas tagging dependencies would be an edge filter?
w
a predicate API that allowed for considering individual edges would be an edge filter, yea. it doesn’t require tagging: for example, maybe you want to include from one
Dependencies
field subtype or another
or you just want to exclude edges from
TypeOne -> TypeTwo
meanwhile, i’m inching closer to porting that loop to rust, heh. so i hope that we can add features without making it slower.
p
Is passing in a lambda on the request a kosher way for the consumer to pass its filter? (Whether the filter is node or edge)
h
I'm not suggesting that target's should know how they are consumed, I'm suggesting that in this case the traversal of an edge is NOT dependent on any property of that edge.
p
One interesting thing: With
python_distribution
traversing all the other
python_distribution
targets, if a python source under dist A imports python source under dist B, and dist B explicitly depends on dist A, then dist A will package that source even if none of the entry points depend on it. Once we stop traversing the graph through package deps, we may have to be much more explicit about what each dist includes (esp for library dists that don't have entry points).
OK. Here's how I think the predicate could look:
(in a WIP PR to explore this)
I just found https://github.com/pantsbuild/pants/issues/11270 (thanks to a graph test func that mentioned it) and I don't understand all of the conversation around structure sharing / graph logic / ways to expose the dependency mapping.
ooh. @witty-crayon-22786 it looks like my changes will conflict royally with your mutable memoization in https://github.com/pantsbuild/pants/pull/17394 How close is that to being ready for review / merged?
I added tests and marked my PR ready for review. šŸ™‚
w
sorry for the long delay. i added a comment that sets a somewhat arbitrary benchmark that i think would be good to hit with this filtering API before landing it: https://github.com/pantsbuild/pants/pull/19155#discussion_r1213566914 obviously this API doesn’t have to be perfect before landing. but IMO, ensuring that this API is sufficient to remove some existing complexity would be good. thanks for looking at it!
b
My review is still; pending šŸ¤¦ā€ā™‚ļø
p
I'm studying
SpecialCasedDependencies
, to figure out what use-cases it covers... 🤯 Handling
archive
and
debian_package
look fairly straight forward, and then it gets fuzzier but still doable (via tagged dependencies?) for
python_test(s)
,
shunit2_test(s)
,
pex_binar{y,ies}
,
python_{google_cloud,aws_lambda}_function
,
python_aws_lambda_layer
,
adhoc_tool
,
shell_command
,
run_shell_command
,
system_binary
, and
helm_deployment
. And then
wrap_source_as_*
and
relocated_files
need something a bit different, but probably still doable. But then there's
jvm_war
. Geez. The complexities around handling java are so painful.
b
It's a... War zone šŸ˜‚
p
LOL 🤣
w
i think the easiest way to describe
SpecialCasedDependencies
is: ā€œwe want to support multiple subclasses of
Dependencies
on one Targetā€
and yea, in some cases you want to consume those separately, sometimes all together