Thoughts on changing `Paths.dirs` and `Snapshot.di...
# development
h
Thoughts on changing
Paths.dirs
and
Snapshot.dirs
to only include the dir names where there was a match? if you have only the
file some_dir/subdir/f.go
and the glob
**/*.go
, it will include 
some_dir
when I only want 
some_dir/subdir
If you do want "all dirs", that seems much easier to post-compute than asking for "only dirs with a match"
h
Why?
What's the benefit of this? I fear unintended consequences, since we use those in a bunch of places.
But if it can be done cleanly...
h
Replace this target gen code with
paths.dirs
Copy code
dir_to_filenames = group_by_dir(go_paths.files)
    dirs_with_go_files = []
    for dir, filenames in dir_to_filenames.items():
        if any(filename.endswith(".go") for filename in filenames):
            dirs_with_go_files.append(dir)
We're duplicating it in Terraform target gen too
w
the semantics are currently that only dirs that actually match the glob are included.
it’s a bit strange, but it makes source roots easier, because you can “find all directories like…”
having said that, i think that if we start moving toward “matching a directory means recursively everything in that directory”, we’ll want to change our path globs as well.
h
That doesn't seem the case? The result includes some_dir and I only want some_dir/subdir
w
but probably even more reason to consider replacing capturing dirs with recursive behavior.
h
It would be extremely useful if it was "only dirs with a match".
even more reason to consider replacing capturing dirs with recursive behavior.
Why change our Rust code to do that? Wouldn't the calling code simply use
**/*
for the glob? I'm talking about the behavior of the Rust intrinsic
w
because if
./pants test $dir
is going to be recursive, aligning our path/source globs could make sense semantically
h
I'm not seeing that. In that case, the caller rule could convert
dir
into
dir/**
. We need to retain the ability to have a glob only for the directory and not recurse,
*.go
vs
**/*.go
Maybe I'm misunderstanding your suggestion?
w
@hundreds-father-404: you’re thinking of the rules, but i’m thinking of end user source globs, and making “matching a dir is recursive” consistent across Specs and sources
basically, if matching a dir is recursive, i don’t think we’d still have a
Paths.dirs
list, because the reason you would have matched would be for recursion
(which would have the same motivation as for `Specs`: consistency with git)
h
you’re thinking of the rules
Indeed, I'm solely talking about the behavior of our intrisinics
Paths
and
Snapshot
, how their
.dirs
field works We can change our user semantics to have
dir
translate to
dir/**
without changing our engine code
because the reason you would have matched would be for recursion
For those two callers perhaps, but there are other cases where it's helpful to have the list of dirs with a match, like my Go target gen idea
w
only include the dir names where there was a match? if you have only the 
file some_dir/subdir/f.go
 and the glob 
**/*.go
, it will include 
some_dir
 when I only want 
some_dir/subdir
i don’t know how these semantics would work:
some_dir
doesn’t actually match the glob you specified
**/somedir
would,
***/**
would, etc
to get what you want, i think you want to
os.path.dirname
the matches…?
h
This is how the current behavior works. When you have the glob
some_dir/subdir/f.go
,
.dirs
will include
some_dir
and
some_dir/subdir
. It includes all directories seen in the resolved files I'm proposing to instead only have the directories that contain a matched file. There is a matched file in
some_dir/subdir/f.py
, so include
some_dir/subdir
. There is no matched file in
some_dir
, so leave it off. It's not that the directory itself was in the glob, it's that the dir has a matched file
w
That is
dirname
h
I think that's why we're talking about different things. I am not proposing any changes to the input to
PathGlobs
. Only
Paths.dirs
That is dirname
It is, indeed! Which is what this snippet does:
Copy code
dir_to_filenames = group_by_dir(go_paths.files)
  dirs_with_go_files = []
  for dir, filenames in dir_to_filenames.items():
     if any(filename.endswith(".go") for filename in filenames):
         dirs_with_go_files.append(dir)
But I'm proposing it would be useful for the engine to calculate that for you already, rather than all this Python support code
If you want the current behavior of "all directory paths", you could still get that with my proposed change thanks to extra Python code similar to above. Both semantics are 100% achievable to calculate currently via extra Python code. But which is more useful to have pre-computed, vs. requiring the extra steps?