I’m stuck at how to differentiate between filesyst...
# development
h
I’m stuck at how to differentiate between filesystem specs vs. address specs in
cmd_line_spec_parser.py
. Several ambiguities: 1. does
src/python/pants/util
mean
src/python/pants/util:util
or the directory
src/python/pants/util
- For now, filesystem specs don’t work with directories (
PathGlobs
doesn’t), we’ll stick with expanding to
src/python/pants/util:util
2. Not all files have endings, e.g. the file
./pants
I believe it is safe for the parser to use
Path.is_file()
. We already do
Path.exists()
in
arg_splitter.py
. So, I’m thinking these heuristics: -
Path.is_file()
-> filesystem spec -
*
in the arg -> filesystem spec because it’s a glob - otherwise, address spec
cc @aloof-angle-91616, I think you’d find this a fun problem
Note that the options parser will already have disambiguated between
specs
vs. goals for us, fortunately. That is, it already clarified whether
cache.java
refers to the goal
cache.java
or the file `cache.java`; and whether
haskell
refers to directory
haskell
vs. the goal
haskell
a
does src/python/pants/util mean src/python/pants/util:util or the directory src/python/pants/util
so right now, for a lot of cases, it seems like the desired behavior when selecting a directory would be to just operate on all the targets within that directory
👍 1
which is the same as using
::
h
Absolutely agreed
Due to backwards compatibility, we can’t change those semantics yet, though. V1 will not support filesystem specs, only traditional address specs and
--owner-of
a
and if we're talking about a directory used as an input or output to a task or something that would obviously be its own separate thing
well i was thinking we wouldn't need to change any semantics
unless there's a use case for a positional directory command-line arg that isn't covered by
::
h
well i was thinking we wouldn’t need to change any semantics
It’s confusing to me for
src/python/pants/util
to sometimes mean
util/
and sometimes
util:util
. It’s confusing because it’s not transparent. We’d need to first drop support for
src/python/pants/util
expanding to
src/python/pants/util:util
, which I’d be in favor of (in a followup - not close to being ready yet)
a
to sometimes mean util/ and sometimes util:util.
well, until now, it has always just meant
util:util
. the proposed new behavior i believe you're describing is to have it now sometimes mean
util/
as well, which you've described as confusing and not transparent.
h
Cool. It is safe to use
Path.exists()
. We use in the same file
os.path.realpath()
(==
Path.resolve()
)
a
i was asking above about a use case not covered by
::
because if not, then we may not need to introduce the
util/
interpretation at all
yes but beware of perf regressions
👍 1
h
To clarify, I meant to punt on the directory ambiguity problem. For now, goal is resolving files and globs
a
yes great
i think this won't be incredibly difficult to solve eventually
👍 1
either we introduce some signifier to disambiguate them (like a protocol schema) or we like screw around with other cli arguments somewhere else
or something
w
I believe it is safe for the parser to use
Path.is_file()
. We already do
Path.exists()
in
arg_splitter.py
. So, I’m thinking these heuristics:
this isn't safe without relativizing to the buildroot. but it is in general a sandbox escape
👍 1
if that path goes from being a file to being a directory, the rule will still be memoized, because it did not use the engine's APIs to access the file.
so would need to capture a Snapshot to differentiate the type.
h
Note that this parsing happens before the engine ever gets involved. It’s in
base/cmd_line_spec_parser.py
w
...oh. but you're talking about in ``cmd_line_spec_parser`` ... sorry. that's ok unless it is actually moved into a rule.
yep. got it, sorry.
a
I don’t love that creating a
BUILD
file will change the meaning of a command line… That feels like a bad property of the UI…
@hundreds-father-404 Not sure if you saw ^^
h
I don’t love that creating a BUILD file will change the meaning of a command line… That feels like a bad property of the UI…
What do you mean by this? Is this in reference to disambiguating between directories vs. targets? To clarify, I 100% want to punt on this decision (But ftr, I really like Benjy’s suggestion to add a global option that allows users to specify which semantics they want)
a
+1 to both daniel and eric regarding punting
a global option that changes the semantics of the command line feels weird to me, i’d somewhat like to be able to parse a command line in my head without spooky action at a distance
a
So the question is what does
./pants cloc foo
mean in the following directory layouts: 1. foo is a file 2. foo is a directory 3. foo is a directory and
foo/BUILD
exists I don’t think a flag should change that meaning, and ideally that meaning wouldn’t change even if filesystem changes are made to change which one of 1/2/3 is the case. Otherwise when I write my script, and I write it slightly sloppily because I don’t understand the edge cases, my script may start doing the wrong thing when faced with a different filesystem than I tested it on.
h
Right now: 1) If
foo
is a file, treat it as a file 2) Otherwise, always try to expand to
foo:foo
, even if there is no BUILD file or directory
a
i don't understand when we would want to pass in directories as command-line arguments. what use case is this for?
h
Our original motivating use case for filesystem specs was wanting to be able to run
./pants buildgen src/python 3rdparty/python
on a brand new Pants repo with no BUILD files
a
how would you feel about
./pants buildgen src/python:: 3rdparty/python::
?
that's really what you want with buildgen right? not just the directory but recursive subdirectories?
h
./pants buildgen src/python:: 3rdparty/python::
means you are using address specs, but there are no BUILD files yet so this won’t actually match against anything
a
right now that would fail to expand globs and cause an error, but we can choose to instead interpret it as a glob for the purposes of the
buildgen
task
h
With today’s implementation of filesystem specs, you could do
./pants buildgen 'src/python/**/*' '3rdparty/python/**/*'
a
ok, but we're creating pants targets right? why is
/**/*
better than
::
?
a
ok, but we’re creating pants targets right? why is
/**/*
better than
::
?
Because it’s a really weird special case, both in the user interface and in the engine
a
unless we're literally like downloading a .tar.gz and literally extracting it into some directory, i feel like using target specs makes more sense
Because it’s a really weird special case
i think we can just avoid erroring when
::
doesn't match any targets though? the reason i like this is because right after you run
buildgen
, you can immediately change the command-line goal, and not have to change any address specs, so you can run
./pants list src/python:: 3rdparty/python::
immediately. that seems SUPER intuitive to me.
h
ok, but we’re creating pants targets right? why is /**/* better than ::?
cmd_line_spec_parser.py
needs some way to know “they want filesystem specs” vs. “they want address specs”. The engine may only pass one type of Specs to the goal, otherwise we get graph ambiguity that there are two ways to compute things like
BuildFileAddresses
. So, we must reserve
::
to always mean address specs and never try to co-opt it for filesystem specs. This means we need some sort of glob for filesystem specs to complement
::
.
*
is a natural glob that is already supported by
PathGlobs
and is natural to CLI users
a
filesystem specs are great, i have no issue with them. for the case of buildgen, i feel like they're inappropriate compared to
::
because you then have to change the pants command line specs to operate on the targets you just generated.
h
for the case of buildgen, i feel like they’re inappropriate.
Buildgen is meant to work on a brand new repo, though, with no targets. So,
./pants buildgen ::
won’t resolve anything on a first run. We could hack the AddressSpecs
::
to be converted into filesystem specs, which is what Toolchain currently does, but that’s not great to change the semantics. After that first run, you can use
./pants buildgen ::
of course.
a
Do we have any cases where we actually really want file specs for anything other than buildgen?
h
Do we have any cases where we actually really want file specs for anything other than buildgen?
Well, to clarify, we want to support either filesystem specs or address specs for goals. Another example:
./pants test tests/python/pants_test/util/test_strutil.py
a
We could hack the AddressSpecs :: to be converted into filesystem specs, which is what Toolchain currently does, but that’s not great to change the semantics.
that's not what i'm thinking of. i'm thinking of them representing target globs. the
buildgen
task would read the
Specs
as target globs, convert each
Spec
into whatever type of input
buildgen
needs, then run it. i'm not thinking of making
::
into filesystem globs.
h
Why do we want to allow
./pants test tests/python/pants_test/util/test_strutil.py
? Leveraging how users already interact with most tools. For example, you’d run Pytest as
pytest tests/python/pants_test/util/test_strutil.py
. Same with running MyPy, isort, and Flake8. Users can now use tab completion on specs, which currently only works when specifying the directory Finally, you no longer need to lookup what the target name is that corresponds to that file (a huge pain for me doing the Py3 migration)
a
Finally, you no longer need to lookup what the target name is that corresponds to that file
./pants --owner-of=tests/python/pants_test/util/test_strutil.py test
doesn't work?
to be clear, i strongly support the use of filesystem specs for the case of running targets that own a filesystem spec.
h
I’m thinking of them representing target globs. the buildgen task would read the Specs as target globs,
Please checkout https://docs.google.com/document/d/15xphZcFnowytF0Qu2sO1PG7wHAWoyLof7hK0jVa5T8E/edit#heading=h.dn7v2rykapob if you haven’t recently. It was rewritten this past week to account for a) a new understanding of the different types of use cases we have. There are 4 types of goals, not the original 2 I thought. b) having had spent the past two weeks in this spec code, a new understanding of how this can actually be implemented. For example, we dropped the idea of a top level
Specs
a
Please checkout https://docs.google.com/document/d/15xphZcFnowytF0Qu2sO1PG7wHAWoyLof7hK0jVa5T8E/edit#heading=h.dn7v2rykapob if you haven’t recently. It was rewritten this past week to account for
I will have a read 🙂
❤️ 1
a
left some comments, back to work
h
Just to provide more general context: Forcing end users to think about targets is an imposition. It hinders adoption. Users already understand files. Letting them do
./pants test src/python/foo/bar_test.py
is intuitive,
./pants test src/python/foo:bar
is less so. It requires reasoning about a new conceptual domain. Targets may still need to exist (e.g., to express deps and interpreter constraints), but the user should only have to think about them when necessary.
❤️ 1
🔥 2
👖 1
Fortunately, the v2 engine makes this straightforward!
a
ok, that’s a very interesting approach i hadn’t considered before. i love it