Filtering dependencies of package targets seems to come up r Pants #development

Filtering dependencies of package targets seems to...

proud-dentist-22844

05/25/2023, 4:39 AM

Filtering dependencies of package targets seems to come up regularly (

python_distribution

pex_binary

debian_package

, and now the nfpm package types I'm working on). In many cases, these packages depend on another package, we want to include the package/archive itself, not the sources packaged in the target or owned by the target. For

python_distribution

there's quite a bit of logic to determine which distribution "owns" a given python source file. Some package targets (like

debian_package

) use a SpecialCasedDependency field to exclude their deps from normal dep calculation. So, what if we allow passing a predicate (lambda function) when requesting transitive deps that says whether or not to follow the dependencies of a given target. Maybe something like this (probably with the predicate on

TransitiveTargetsRequest

instead of on the

_DependencyMappingRequest

)? https://github.com/cognifloyd/pants/commit/b4044e76685041cdac322a3e96f4858b4634f045 (For what it's worth, mypy is happy with this stub commit.) Do any other request types use a lambda like this? Is that going to mess up graph calculation some how? I tried to minimize the effect on the current predicate-less code path, but I'm not sure about the performance implications of messing with the dep graph here. Should I keep going with this approach?

proud-dentist-22844

05/25/2023, 2:32 PM

@witty-crayon-22786 @happy-kitchen-89482 do you have thoughts on this?

happy-kitchen-89482

05/25/2023, 3:01 PM

This is a relevant issue: https://github.com/pantsbuild/pants/issues/12794

👀 1

happy-kitchen-89482

05/25/2023, 3:04 PM

In the specific case you mention though, I think it's a bit simpler than what's discussed in that issue.

happy-kitchen-89482

05/25/2023, 3:04 PM

AFAICT we ~never want to traverse deps through "package targets" like

pex_binary

? So the graph traversal itself should understand that concept?

happy-kitchen-89482

05/25/2023, 3:05 PM

(never - in normal use, maybe we do when the user manually queries the graph with the

paths

goal or similar)

happy-kitchen-89482

05/25/2023, 3:06 PM

So maybe a predicate is overkill (although it can still exist as an implementation choice)

happy-kitchen-89482

05/25/2023, 3:06 PM

There are really just two predicates: "always follow" or "don't follow the deps of packageable targets"

happy-kitchen-89482

05/25/2023, 3:06 PM

I think

happy-kitchen-89482

05/25/2023, 3:06 PM

Supporting arbitrary ones might get very confusing, unless we know we need to

happy-kitchen-89482

05/25/2023, 3:07 PM

Right now a target type doesn't know that it is packageable, but we can change that

proud-dentist-22844

05/25/2023, 3:18 PM

Yeah. That issue is much broader in "scope" (couldn't resist the pun).

proud-dentist-22844

05/25/2023, 3:20 PM

How would the graph know that a target is a package target? I don't think we have anything that is constantly used for all packagable targets.

proud-dentist-22844

05/25/2023, 3:21 PM

Do all of them have an

OutputPathField

? Nope. Every package type implements its own

PackageFieldSet

proud-dentist-22844

05/25/2023, 3:40 PM

Hmm. Instead of a predicate, the consumer could pass in a

FieldSet

class, and then use its

.is_applicable

classmethod as the predicate. Then, if that returns True, skip that target's dependencies. Or, I could also just grab all

union_membership.get(PackageFieldSet)

and do that for all of them.

bitter-ability-32190

05/25/2023, 3:48 PM

So Instead of doing this for a specific scenario, I think we should just bite the bullet on the "tagged dependencies" idea.

bitter-ability-32190

05/25/2023, 3:49 PM

That solves a whole host of issues, this included (AFAICT)

proud-dentist-22844

05/25/2023, 4:03 PM

@happy-kitchen-89482 doing this for all package types would mean importing

PackageFieldSet

into the engine internals which is a little gross. :/ Moving it is an option as well.

proud-dentist-22844

05/25/2023, 4:05 PM

@bitter-ability-32190 how does tagged deps help with not traversing the dependencies of package targets? 🤔 It does help with that "scope" issue. But tagging the deps in this case seems like more boilerplate.

bitter-ability-32190

05/25/2023, 4:22 PM

Tagged dependencies gives Pants more a clue to how a dependency is meant to be consumed (and therefore should be provided by Pants). In this case, we don't know whether the dev wants the target as a loose set of files (like say, I wanted to lint the files that go into a

pex_binary

) or as the packaged thing (I intend to upload this

artifact

using my super special script). Tagging allows that vital information to be known to Pants, therefore we make less assumptions (exactly 0 assumptions 😛)

bitter-ability-32190

05/25/2023, 4:23 PM

We can likely switch our current assumption from "you wanted all the inputs flat on disk, right?" to "you wanted the packaged artifact, right?" and mayyyybe get away with it, but it'd be great if we just stopped assuming

happy-kitchen-89482

05/25/2023, 4:36 PM

I'm suggesting that we would add some way of knowing this

happy-kitchen-89482

05/25/2023, 4:37 PM

Via the target type

happy-kitchen-89482

05/25/2023, 4:38 PM

I don't think tagged dependencies is the way to go here. In this case "do I follow this dep" is not a property of the dep. It's a property of where I started the traversal from...

bitter-ability-32190

05/25/2023, 4:40 PM

The whole point of our lengthy discussions earlier was that targets shouldn't/can't assume how they will be consumed. We shift that knowledge onto the consumer

witty-crayon-22786

05/25/2023, 4:42 PM

imo, step one is a consuming predicate/API that allows for controlling which dependencies are traversed. step two is adding more information for that API to consume.

☝️ 1

bitter-ability-32190

05/25/2023, 4:42 PM

That............ damn. I've been Stu'd. Yeah that makes complete sense.

proud-dentist-22844

05/25/2023, 4:43 PM

Something like this predicate? https://github.com/cognifloyd/pants/commit/b4044e76685041cdac322a3e96f4858b4634f045

proud-dentist-22844

05/25/2023, 4:44 PM

Or you're thinking of a new request/response type that allows for filtering?

witty-crayon-22786

05/25/2023, 4:45 PM

@proud-dentist-22844: right. and sorry for not reading the whole thread when i got the ping =x

witty-crayon-22786

05/25/2023, 4:46 PM

a common consideration with these APIs is “node” vs “edge” filtering… the predicate you have is doing node filtering

proud-dentist-22844

05/25/2023, 4:47 PM

Whereas tagging dependencies would be an edge filter?

witty-crayon-22786

05/25/2023, 4:48 PM

a predicate API that allowed for considering individual edges would be an edge filter, yea. it doesn’t require tagging: for example, maybe you want to include from one

Dependencies

field subtype or another

witty-crayon-22786

05/25/2023, 4:48 PM

or you just want to exclude edges from

TypeOne -> TypeTwo

witty-crayon-22786

05/25/2023, 4:49 PM

meanwhile, i’m inching closer to porting that loop to rust, heh. so i hope that we can add features without making it slower.

proud-dentist-22844

05/25/2023, 4:51 PM

Is passing in a lambda on the request a kosher way for the consumer to pass its filter? (Whether the filter is node or edge)

happy-kitchen-89482

05/25/2023, 4:54 PM

I'm not suggesting that target's should know how they are consumed, I'm suggesting that in this case the traversal of an edge is NOT dependent on any property of that edge.

proud-dentist-22844

05/25/2023, 5:34 PM

One interesting thing: With

python_distribution

traversing all the other

python_distribution

targets, if a python source under dist A imports python source under dist B, and dist B explicitly depends on dist A, then dist A will package that source even if none of the entry points depend on it. Once we stop traversing the graph through package deps, we may have to be much more explicit about what each dist includes (esp for library dists that don't have entry points).

proud-dentist-22844

05/25/2023, 5:41 PM

OK. Here's how I think the predicate could look:

proud-dentist-22844

05/25/2023, 5:41 PM

https://github.com/pantsbuild/pants/pull/19155/files#diff-96dbffb30cf9341edd3f0c92d69c7085693d46df619fff8ec40a5f09569ba362R202-R209

proud-dentist-22844

05/25/2023, 5:41 PM

(in a WIP PR to explore this)

proud-dentist-22844

05/25/2023, 6:41 PM

I just found https://github.com/pantsbuild/pants/issues/11270 (thanks to a graph test func that mentioned it) and I don't understand all of the conversation around structure sharing / graph logic / ways to expose the dependency mapping.

proud-dentist-22844

05/25/2023, 6:48 PM

ooh. @witty-crayon-22786 it looks like my changes will conflict royally with your mutable memoization in https://github.com/pantsbuild/pants/pull/17394 How close is that to being ready for review / merged?

proud-dentist-22844

05/25/2023, 10:49 PM

I added tests and marked my PR ready for review. 🙂

witty-crayon-22786

06/01/2023, 7:11 PM

sorry for the long delay. i added a comment that sets a somewhat arbitrary benchmark that i think would be good to hit with this filtering API before landing it: https://github.com/pantsbuild/pants/pull/19155#discussion_r1213566914 obviously this API doesn’t have to be perfect before landing. but IMO, ensuring that this API is sufficient to remove some existing complexity would be good. thanks for looking at it!

bitter-ability-32190

06/01/2023, 7:11 PM

My review is still; pending 🤦‍♂️

proud-dentist-22844

06/01/2023, 11:30 PM

I'm studying

SpecialCasedDependencies

, to figure out what use-cases it covers... 🤯 Handling

archive

and

debian_package

look fairly straight forward, and then it gets fuzzier but still doable (via tagged dependencies?) for

python_test(s)

shunit2_test(s)

pex_binar{y,ies}

python_{google_cloud,aws_lambda}_function

python_aws_lambda_layer

adhoc_tool

shell_command

run_shell_command

system_binary

, and

helm_deployment

. And then

wrap_source_as_*

and

relocated_files

need something a bit different, but probably still doable. But then there's

jvm_war

. Geez. The complexities around handling java are so painful.

bitter-ability-32190

06/01/2023, 11:40 PM

It's a... War zone 😂

proud-dentist-22844

06/01/2023, 11:41 PM

LOL 🤣

witty-crayon-22786

06/02/2023, 12:36 AM

i think the easiest way to describe

SpecialCasedDependencies

is: “we want to support multiple subclasses of

Dependencies

on one Target”

witty-crayon-22786

06/02/2023, 12:37 AM

and yea, in some cases you want to consume those separately, sometimes all together

Open in Slack

Previous Next