<@U051221NF>: <https://github.com/pantsbuild/pants...
# development
w
@happy-kitchen-89482: https://github.com/pantsbuild/pants/pull/9799 is a pretty substantial change in behavior, haven’t reviewed yet, but it might be worth gathering wider feedback on
h
Indeed, it is a big change. But a lot of the feedback we’ve gotten about source roots is confusion. This makes the feature substantially less complex at the cost of some one-time verbosity in config
The original approach of
root_patterns
was a middle ground between the original system and this. We got feedback from two users that they were still confused. Part of this was a poor deprecation warning, but part also seemed from the feature still being too generalized
h
Yeah, I went about this change assuming that we don't need to explain it vs root_patterns, because that won't be in a stable release. But we do need to justify it vs the old flags, and I think that's easy to do.
👍 1
h
A key detail is that this new system can still express everything expressed before - it’s just more verbose to do it.
w
need to review more deeply, but the issue i think i see is that it adds a lot more boilerplate for repos that have good conventions around directories
this is already pretty not awesome: https://github.com/pantsbuild/pants/pull/9799/files#diff-229da667a54b8bcc716aea7b356cc2ad … and then imagine that in a repo with 4000 projects.
it gates project creation on one file.
h
it gates project creation on one file.
Is the issue that you anticipate orgs not letting any random team modify that file?
w
right. and it’s the kind of file you’d need to invalidate your whole CI for, etc.
consequently it would be reviewed by what twitter called “root owners”
h
right. and it’s the kind of file you’d need to invalidate your whole CI for, etc.
Ah, now that is indeed a major concern. This type of issue (like the
globs
deprecation) causes serious issues for users.
w
eh, globs removed boilerplate… this adds it, afaict
h
Right, but I mean
globs
made it hard for JP to land an upgrade. It was indeed worth doing, but my point is that we do need to be careful with changes that require your entire universe to rebuild, or in their case, to redeploy. This would likely be frustrating for them if a team adding a new project caused every thing to rebuild
h
Well, the plan is to allow custom rules for inferring source roots (via a Union I guess). Those patterns were very "maven-style JVM project" centric, they aren't that useful in Python repos.
w
@happy-kitchen-89482: they were also used for hundreds of python projects at twitter
because “the battle for a single sourceroot” is never won.
h
So I imagine three ways to specify source roots: 1. Enumerate them in config. 2. A sentinel file (.SOURCEROOT or something) 3. Custom rules that, e.g., infer the source root from the presence of a setup.py or whatever. That is very repo-specific.
w
the benefit of the patterns is that by following directory naming conventions, users didn’t have to change any global config to onboard
h
Sure but Python repos don't do
src/python
source roots, usually. Maybe Twitter did, but that's not particularly idiomatic.
w
they did, yea.
h
What I've been seeing more in the wild is source roots that no pattern can match.
w
that’s an argument in favor of a way to declare more, but i don’t think it’s an argument against patterns.
h
Just a tree of projects with no
src/
or
src/python
or anything like that.
w
i think that asking folks to declare custom rules for this usecase is too high a bar.
h
I dunno, I feel like the patterns added a lot of complexity both in the implementation and to users, unless the defaults happened to work for them.
And my observation is that in the Python realm those defaults were not useful.
So basically even with patterns we aren't solving the problem.
w
simplifying or adapting patterns in some way might be an option, but i think that removing them entirely is not a good idea… as mentioned before, it discourages having directory naming conventions, or would force you to write rules if you already had conventions.
sentinel files might work as a replacement, but even that is a bit unfortunate in a repository with directory naming conventions: for example, at twitter it would be the fifth or sixth file at the root of each project
h
In Python land conventions exist, they just can't be captured by patterns. So we need something else. And I'm not sure we need patterns once we have that something else.
w
let’s know what that something else is before we remove them, imo.
is it sentinel files? i don’t think it should be rules.
h
I think Twitter's example (of having a capturable prefix like
src/python
) is unfortunately uncommon and not something to rely on.
Sentinel files are easy to understand and apply.
w
patterns don’t rely on anything… it’s just an alternate way to accomplish something.
👍 1
h
I think custom rules can be for advanced cases ("we happen to know that in our repo every source root contains a `setup.py`")
patterns rely on there being a fixed prefix like
src/
or something
w
sure, but my point is: if the capability to hardcode them exists, patterns are not the only way to accomplish things.
h
i think that asking folks to declare custom rules for this usecase is too high a bar.
+1 on writing rules being a high barrier to entry, especially since this is a prereq to even running
./pants test foo.py
and determining if you like Pants enough to use it. This is page 4 of our onboarding docs.
w
re:
setup.py
, that possibly sounds like a case for configurable sentinel files, rather than custom rules…
h
True in that case, yes.
I think sentinel files are more practical than patterns. I can add patterns back in a more limited way, it's not too hard. But one important difference will be that it now should take the innermost match, not the outermost, to mirror how sentinel files behave. Because in Python world we can actually see source roots under other source roots.
h
Do patterns prevent us from doing anything? I know it makes the code more complex, but is it blocking any functionality? In the docs, we can treat explicit values as the default, and only introduce patterns like
*
as an advanced feature.
h
See above, as implemented today they prevent us from taking the innermost match
Reintroducing patterns in a limited way is actually not hard, it's the trie-based matching I want to get rid of (and it's not clear that it was ever necessary, from a performance standpoint, TBH)
👍 1
Basically I want the new idiom to be, you start at the innermost directory, and ask an oracle "is this a source root?", and then keep going up until you hit.
w
yea, that seems reasonable.
h
In this change the oracle just looks things up in a set of fixed source roots. I am fine if that also supports matching a pattern.
👍 1
And "does this dir contain a sentinel file" would be another thing the oracle would do.
w
yep.
h
+1 to changing to innermost directory. That makes sense to me conceptually - you’re going from most specific to gradually less specific, until finally erroring.
I think Stu’s point is the importance of patterns for boilerplate reduction, rather than the importance of outermost vs. innermost.
h
I'm fine if the roots values are treated as regexes, for example.
👍 1
Or optionally so
w
that might make them harder to apply. continuing to do “`*` consumes at most one directory” would be good.
ie, treat them as globs rather than regexes.
but yea, i think that the trie can go.
if you implement the “recursive upward walk” in the engine, it’ll be memoized and will watch files (in particular, sentinel files) appropriately.
👍 1
h
OK this is now fixed (although not implemented in a rule yet, I want to get the semantics first, and that can be a separate change)
❤️ 1
w
cool beans… lemme know when it’s ready for another look
h
(I think it’s ready for review. I left a couple small changes, but nothing major.)
👍 1