<@UB2J9BQA0>: continuing discussion from <https://...
# development
w
@hundreds-father-404: continuing discussion from https://github.com/pantsbuild/pants/pull/13224#issuecomment-941434427
it seems like the easiest way to have
Specs
match “everything living in a directory” is for the actual path component to be front and center by continuing to use “file(system) addresses” when representing things that live on disk
the semantics of matching paths can be made pretty clear… target globs
:
and
::
would match an
Address
based on its filesystem location (rather than where its metadata lives)
and targets which don’t correspond to a path on disk would use
#
syntax, and should not match for anything other than the location of the generator
h
To be clear, as a replacement for https://github.com/pantsbuild/pants/pull/13224, right? They both have the same CLI specs of
operate on where the target 'is resident'/'lives'
this is different typing for how to express that
w
yes: i’m suggesting to use the
Address
(only) to decide whether something matches a
Spec
👍 1
and to match using only the file(system) path
and to make that tractable, we would continue to use file(system) addresses when generating targets for things that live on disk: with the change being that that would also include directories
h
Thanks, makes sense. So then that brings up the discussion we started having over DM: how we feel about keeping "file addresses" and adding "directory addresses". Rather than this:
Copy code
src/go:mod#./mypkg
src/go:mod#./mypkg/util
We'd have
Copy code
src/go/mypkg:../mod
src/go/mypkg/util:../../mod
We both agreed that the
../../mod
isn't great. So, you are wondering if it really makes sense to be generating targets for subdirs like Go and Terraform are doing. Whether we should always use 111. Right?
w
correct.
the file(system) address syntax biases toward 111, with the benefit being that the path is front and center, and not broken up by extra syntax
h
One motivation I'm hearing for always using 111 is to facilitate this "directory address" syntax. Are there other motivations in your mind? Trying to understand where you're coming from
w
(which has the additional benefit of being in alignment with `Specs`: the matched part would ~always be the part before the
:
)
@hundreds-father-404: it’s the other way around probably. 111 has lots of benefits, which is why we biased toward it with the file address syntax in the first place
it’s much more awkward to define/override metadata for paths in other directories (with or without overrides)
h
It has lots of benefits for file-based languages like Python, that metadata lives close to where it's defined I don't think that value has been established for dir-based languages like Go, where the most atomic you can go is a dir. And where it's also very unlikely you'll change metadata because Go tooling is so great that our dep inference works well etc
w
and the lack of locality makes it hard to grok where metadata lives when going to edit it.
@hundreds-father-404: arguably the benefit is even stronger for directory based languages
h
and the lack of locality makes it hard to grok where metadata lives when going to edit it.
In Go, you edit the sole BUILD file you have in your project, where the
go.mod
is located. Which is idiomatic fwict *to Go developers, that's already where you'd go to change metadata
w
i think that what is idiomatic in terms of BUILD file location is what’s being discussed here.
i don’t see why it would make sense if a dep failed to be inferred (say because it’s a resource) for it to be declared in a root BUILD file and applied to something deep in the tree
the
import
statements aren’t at the root of the repository, because they are locally relevant
at risk of turning this into a canoe, it’s probably worth getting Tom’s opinion.
h
agreed, cc @fast-nail-55400. and also @happy-kitchen-89482 (who has the goal of getting rid of BUILD files wholesale where possible, but who also doesn't like generated address syntax much)
h
Not sure about requiring 111 just so that syntax can work well
h
We want to have fewer BUILD files, ideally none (except where manual information is actually needed)
1
It would be helpful if you defined what 111 means in a dep-inference/target generator world
and also why has lots of benefits
w
the issue is that manual information will be needed. and when it is needed, adding it in a root BUILD file is very unwieldy, even with overrides.
h
I do buy it getting unwieldy when it's file-based, with hundreds of files in the subdirs. I don't buy that when you only have a handful - dozen directories
Copy code
overrides={
        "./subdir": {"tags": ["overridden"]},
        ("./subdir", "./subdir/util"): {"test_timeout": 120},
      }
w
I do buy it getting unwieldy when it’s file-based, with hundreds of files in the subdirs. I don’t buy that when you only have a handful - dozen directories
sure… but that’s precisely the case where it doesn’t really matter that much to have BUILD files. https://github.com/toolchainlabs/remote-api-tools/commit/c1fe3e7de819425bf67d33598fbb0837a7b7a416 deleted 13 total lines
h
Going from 10 BUILD files to 3 BUILD files is the more important benefit than removing 13 lines.
w
111 is the difference between:
Copy code
# BUILD
go_mod(
    overrides={
      "./pkg/stats": dict(tags=["overridden"]),
    }
)
and
Copy code
# pkg/stats/BUILD
go_package(
    tags=["overridden"],
)
the former scales poorly, and puts things somewhere other than where they apply
it’s also important to point out that we don’t really need to agree on 111 if we agree that we want file/directory addresses to put the file/directory front and center.
rather than splitting it with a token like
#
(again, except where something does not exist on disk at all, such as with 3rdparty)
Going from 10 BUILD files to 3 BUILD files is the more important benefit than removing 13 lines.
i’m not sure that it’s relevant at all given
tailor
. we’re already massively beyond bazel in terms of boilerplate reduction.
h
Can you explain that last point more? If we do not agree with always using 111, that means we have:
Copy code
src/go/mypkg:../../mod
src/go/mypkg/util:../../mod
vs
Copy code
src/go:mod#./mypkg
src/go:mod#./mypkg/util
Beauty is indeed in the eye of the beholder, but I personally find the latter easier to work with
w
the latter does not align with specs syntax, and it’s not at all clear that it refers to a directory.
h
i’m not sure that it’s relevant at all given tailor. we’re already massively beyond bazel in terms of boilerplate reduction.
We are, which is awesome! Bazel isn't our competition though, complacency / custom Bash scripts are. We should continue trying to minimize boilerplate when possible
h
We should only need a BUILD file at all when it contains non-trivial information
1
w
it will.
h
And very often it doesn't
I don't see that
w
real repositories have binaries, resources, files declared
h
Sure, here and there
not everywhere
w
dockerfiles, etc
@happy-kitchen-89482: sure, but we don’t want them all to be declared at the root of the repository, right?
h
certainly not
by all means create BUILD files wherever makes sense to you
w
that is also not sustainable. where does my metadata live?
h
but I don't think that requiring you to do so everywhere in order for cli specs to make sense (which I think is the topic under conversation?) is a good idea
w
being file/directory centric actually suggests that it should live near those files/directories
h
Sure, if it lives at all
in most directories it will not
having a BUILD file that contains nothing but boilerplate seems ill-advised to me
1
Well, requiring one, I mean
w
@happy-kitchen-89482: it is about more than CLI specs. this is the difference between these syntaxes: https://pantsbuild.slack.com/archives/C0D7TNJHL/p1634082294260300?thread_ts=1634078340.250300&amp;cid=C0D7TNJHL
h
It is a very intimidating onboarding experience - first you have to check in hundreds of boilerplate files with no apparent useful content in order for anything to work
h
That's been my experience helping my friend to onboard a Django repo. Lots of questions about why we need ~empty BUILD files everywhere. Jacob Floyd and others have shared similar feedback (Survey)
1
w
because we know in practice that you will need them later.
h
We don't know that
Some will in some places, but we are preemptively forcing it everywhere
w
@happy-kitchen-89482: we do. point to any repository that has not eventually needed resources / tags / test timeouts / binaries
h
Again, where those are needed, add them, but they are not in every directory
And also, binary targets may not always be necessary, they don't always add information
w
add them how? by extracting them manually from your root BUILD file via glob math?
h
and when you add them, you can use
overrides
. Our users are engineers: if things are getting overwhelming, they can refactor to have metadata live closer to what it modifies
h
Oh I see the issue, so then you have to subtract them out of the higher up target
This is where inheriting metadata up the filesystem ancestry makes sense to me
1
w
me too. but that’s not what is being proposed here.
h
Under what circumstances would a user have to explicitly specify the owning target anyway?
No but that solves the problem of subtraction
You don't subtract
You add a local BUILD file with just the "overrides"
But again, when does a user even care about any of this?
Other than "a file is owned by multiple targets, so must be disambiguated"
w
when they do actually need to list one in a BUILD file (either due to reflection of because files). or when they’re trying to grok what will be matched by CLI specs.
this is more tenuous, but i think that the file/dir address syntax is more compatible with a world where specifying the owning target is optional when it is unambiguous, or when there is no owning target because we’re merging.
h
Yes, I would very much like to live in that world
On the CLI and in BUILD files
w
and that’s because it’s still just a file/dir path. the stuff after the colon could eventually become more optional.
h
src/py/f.py
vs
src/py#f.py
h
I mean, we can debate the merits and/or necessity of 111, but I don't think we should lean in on it so hard that our syntax requires it in order to work
w
@hundreds-father-404: sure, but where do you put the
#
if there are multiple sets of merged metadata?
(the hierarchical merging idea, or any other merging idea)
@happy-kitchen-89482: it doesn’t…. we’ve been using the file address syntax for a year now. it works fine regardless of layout, but biases toward 111
h
sure, but where do you put the # if there are multiple sets of merged metadata?
Not sure, probably on the original target definition which declared the intent of what the file represents. If the
.py
file is
python_source
vs.
resource
h
OK, I thought we were discussing changing that syntax. The general future I would like to strive towards is: A) No BUILD files except where you need them, B) CLI and BUILD reference things as files/globs, with explicit targets only where required for disambiguation
f
Do we have a list of prototypical Pants workflow for each language? If we see the actual workflow in one place, maybe that will inspire answers.
h
I'm more agnostic on anything that doesn't hinder that future 🙂
f
And why not define items like binaries in the source root?
h
That probably doesn't scale?
f
Why does it need to live exactly in same directory?
h
And I agree that it makes sense to keep metadata near the thing it describes
I am fine with saying to people "extra metadata requires a local BUILD target", but that would be an easier sell if you didn't have to subtract sources from the higher-up BUILD target you were previously relying on, or copy its metadata into the local one
👍 1
This is why I think inheriting metadata up the filesystem is relevant
I think that might be what allows us to require local BUILD files if local metadata is needed, but not require them otherwise
f
Maybe we need “prototype targets” that impose their metadata on their directory tree?
h
That, it seems to me, can replace overrides
f
But back to workflow: let’s describe as a user story some of the kinds of local metadata use cases and how an override could be done.
For Go it might be overriding a binaries build options.
👍 1
h
Stu, regardless of if we go with directory addresses, I think our Specs code would still need to hydrate the
Address
somehow to disambiguate the residence dir, right? For file/dir address syntax, we must consult the filesystem to determine if
foo
is a file without an extension vs. a dir. (What
AddressInput -> Address
does) For generated address syntax, we must consult
Target.residence_dir
because the
generated_name
is arbitrary
Okay, yeah, pretty sure that the code to find all
ResidentTargets
solely given a directory path will look extremely similar regardless of "dir address" syntax vs
Target.residence_dir
. Either way, we need to find
AscendantAddresses
with target gen enabled. Then we either inspect the address to determine residence vs look at
Target.residence_dir
The focus for 2.8 is not to figure out generated target syntax vs file syntax. Nor to figure out 111 for Go. Important conversations, but not blocking 2.8 as Go is still experimental So I'm thinking we can punt on dir addresses. Use Target.residence_dir for now, and we can replace it with dir addresses in 2.9 Wdyt @witty-crayon-22786
w
For generated address syntax, we must consult 
Target.residence_dir
 because the 
generated_name
 is arbitrary
my point is that we should not do this, because if a Target corresponds to a file/dir on disk, it should use a file(system) address. and then only the path component of an Address is relevant to
Specs
matching. but yes, we still need to go from
AddressInput
to
Address
So I’m thinking we can punt on dir addresses. Use Target.residence_dir for now, and we can  replace it with dir addresses in 2.9
i think that moving go to using filesystem addresses, and then applying these semantics to filesystem addresses is the path forward. fixing this for filesystem addresses fixes the issue for Python and for Java, which generate filesystem addresses corresponding to files. i think that the right way to fix this for Go is to have it use filesystem addresses for dirs, rather than adding a facility specific to
#
addresses, which should not correspond to things in the filesystem (imo)
h
With file addresses, solely looking at the address (plus filesystem) works for "Given some addresses (generated or not), find all that are resident in this directory" But @witty-crayon-22786 I'm not following your suggestion for how to do this: "Given solely a directory path, find all addresses that are resident in that directory"?
Get(ResidentTargets,  ResidentTargetsRequest("some/dir"))
. Because a target generator can live in any ancestor directory, I believe we must look at all ascendant targets as candidates. Does that part make sense?
w
mm, yea: that’s true. for
Specs
expansion you would need to match
AscendantAddresses
or some equivalent
👍 1
h
Cool Given that need, I'm suggesting that there is not much difference in the
ResidentTargetsRequests("some/dir") -> ResidentTargets
rule whether we determine residence from the
Address
vs.
Target.residence_dir
. The more interesting part of the algorithm was
AscendentAddresses
So, to give us more time to discuss file addresses vs. generated addresses & 111, I'm suggesting we go with
ResidentTarget.residence_dir
for now, as an implementation detail that can change in Pants 2.9+. It means that the new specs semantics will work out-of-the-box with Go, Python, etc without having to make any changes to
Address
- no need to block the release on figuring out 111 and address syntax
w
which types of addresses Go uses isn’t an implementation detail though.
it’s experimental, but that’s the more public aspect of this.
h
Indeed, we may decide that we need to figure out Go target modeling in Pants 2.8 and that it's blocking to do. But, as a separate thread. That discussion does not need to be coupled to how to fix specs Where would be helpful to have that discussion about Go semantics? New thread? GitHub issue?
w
it’s not a discussion about Go’s semantics… it’s a discussion about the Address semantics, and whether a generator address should ever “match the filesystem”. it’s a hack to have something that doesn’t obviously refer to the filesystem match the filesystem.
another compromise here would be to fix Specs for file addresses, and then fix Go’s addresses to be file addresses in a followup change in order to get filesystem semantics.
i’ll spend some time looking at how much work it would be to change Go’s addresses.
h
Okay, I'll keep an open mind to what you find. In the meantime, will go ahead with implementing
Get(ResidentTargets, ResidentTargetsRequest("some/dir"))
via
Target.residence_dir
w
@hundreds-father-404: i still don’t think that i will recommend shipping that. rather, waiting until Go’s addresses can be fixed to give it filesystem semantics with
Specs
.
and just fixing
Specs
for languages already using filesystem addresses
go’s addresses not matching is not a blocker for 2.8.0 IMO, as long as
Specs
are fixed for our other languages.
h
Regardless of if we switch to using the address rather than
Target.residence_dir
, there is a bunch of other work involved in implementing https://docs.google.com/document/d/1WWQM-X6kHoSCKwItqf61NiKFWNSlpnTC5QNu3ul9RDk/edit#heading=h.z4f4lstj51db. Like the rule for
ResidentTargetsRequest -> ResidentTargets
, which is mostly the same regardless of which scheme we use to determine residence I already did the work for
Target.residence_dir
, so now any work I do is on stuff that should be agnostic to what we decide with "dir" addresses
Btw, one edge to handle with "dir" address syntax is how to handle a target generator generating a target for the same directory. With Go, that's
dir#./
. With Terraform, it's
dir#.
👍 1
w
@hundreds-father-404: in the meantime, how to fix
Specs
for file addresses is uncontroversial, right?
h
Indeed, which is precisely what I'm working on. But to do that, we need
Get(ResidentTargets, ResidentTargetsRequest("some/dir")
I'm unblocked by doing that via
Target.residence_dir
, and then you can put up a PR to swap that part of the implementation with dir specs. It's only a few lines that will change in that rule
w
can’t you replace
ResidentTargets
with
AscendantAddresses
and a filter on the address path?
h
That's what the rule does 🙂 And that filter can use either approach It's useful to DRY the logic, e.g. simplifies our
go_binary(main=)
field inference rule for example
w
ok. i’m suggesting landing matching file addresses without
Target.residence_dir
.
and then fixing go in a followup (looking at it now.)
h
My plan is: • Add
(ResidentTargets, ResidentTargetsRequest)
as prework, which also refactors the
go_binary
rule • Use that to "fix" address specs. (Note that our internal types need not == what users type. It may be useful to keep `SiblingAddresses`/`DescendentAddresses` w/ current semantics purely for plugin authors. Unclear!) • Add directory specs as shorthand for
dir:
Changing how the
ResidentTargets
rule's filter works (address vs.
residence_dir
) is encapsulated from everything else, and you're welcome to put up your proposed change there whenever it's ready
w
@hundreds-father-404: i think that the implementation of having
Sibling/Descendent
match addresses generated in parents is easier than that: should be able to do it with only changes to https://github.com/pantsbuild/pants/blob/e2afe17853bdccc891327be90b362134425b9c76/src/python/pants/base/specs.py#L64-L92
👀 1
basically, glob higher in the filesystem, and then filter
h
Interestinggg. Yeah, that could work if the filter were the address. Does not work with residence_dir, indeed My next question then: is there any utility for plugin authors to be able to resolve only targets defined in
dir
, rather than resident? Again, plugin author needs can diverge from what we expose to end users. I don't think there's any use case for that?
w
i can’t think of one… but could see which tests break =x
1
h
@fast-nail-55400 @happy-kitchen-89482 any thoughts on this, given your plugin experience?
is there any utility for plugin authors to be able to resolve only targets defined in dir, rather than resident?
I'm talking about changing the behavior of
SiblingAddresses
and
DescendentAddresses
when used in rules so that they now include generated targets that are "resident" in those dirs It seems likely to me that that is a great default for plugins too. If we find there is some niche case where you only want targets explicitly defined in the folder, we could add a new type
w
“generated targets that are resident” makes this pretty abstract: it’s really: “files in the directory”
1
f
No use case that I can think of offhand.
👍 1
w
not seeing any blockers yet for file addresses referring to dirs… going to grab some lunch, but should be able to post soon afterward.
h
Sg, thanks. I'm a bit stumped on
SourcesSnapshot
, how we let you operate on files without owners with
count-loc
and
validate
. Adding proposal to the design doc + have been doing a refactor that will make it easier to muck around with Specs code
h
Agreed that operating on files without owners can wait for a followup, as long as we have a sense of how
👍 1