----- Encouraged by the awesome responses (thanks ...
# general
e
----- Encouraged by the awesome responses (thanks again Eric!), here's another question re coverage: I'd like to gradually introduce (and enforce) coverage into my repo. That means I want to set a minimum coverage threshold per directory. How do I do that?
b
I'd look into a third-party Pytest plugin which could interact with the coverage plugin. Or write an internal plugin for it
e
Thanks Josh. I suspected a custom plugin might be the way to go. I was hoping, though, for something someone's done already.
And, cards on the table, this was a gentle introduction to another question: I'd like several different things to change between directories. eg I might want a different
.flake8
file between directories. Or a different minimum coverage bar. I assume you'd say the same
?
b
For the flake8 thing, Pants can likely help you there although I'm not sure it'd work as-is today
e
Ideally there would be a general mechanism for finding a config file for a tool. Given a target's directory and the config filename, it would search from there "and up", until finding or declaring it doesn't exit even where pants.toml. Plugins would use this to find their config files, and then pass them as parameters to the tool itself, each in its own way. That would be a start.
b
I'm just not sure which way the search goes, or how Pants acts if it found multiple files. Try it and find out? 🙂 You can use
--no-process-cleanup
to leak the sandbox and inspect it. You can also combine this with a
lint
of a single file. And try out
lint --only=flake8
to only run flake8
e
Cool. LIke config discovery, if it indeed does what I want 🙂 I'll give it a go. Thanks!
b
If not, swing back here and we can help you out
h
I believe config discovery should work for this!
e
So a simple test seems to say that config_discovery doesn't do what I want. Directly in my repo root I have a
.flake8
file, and another one in
utils/
. When I
pants lint utils:
it doesn't pick up the flake8 file under utils/, whether my cwd is in utils/ or in the repo root. I did add
config_discovery = True
to
[flake8]
section in pants.toml, to no avail. I don't mind writing a some pants code for this to work, it's important to me. How do I start?
b
Here's Pants code for config discovery for Flake8, which just calls into a generic helper defined here. Although it seems to glob, so I'm wondering. What's flake8's behavior when you try and lint a file in
utils
from the repo root?
e
If I run
flake8 utils/foo.py
from the repo root, and have
.flake8
in both utils/ and in the repo root, the one in the repo root is used.
b
Yeah OK, not surprised then Pants is just doing what flake8 would do
(Well, really Pants is just running flake8, and flake8 is doing the flake8 thing)
e
I propose that pants starts the search in the directory that holds that BUILD file containing the target.
(if you run against a target; if you run against a specific file then find the target owning the file and "it's solved from there")
b
You might find success in writing your own in-repo flake8 plugin. The flake8 docs for it are lackluster, but Pants supports it. See https://www.pantsbuild.org/v2.11/docs/reference-flake8#section-source-plugins
e
that's crazy 🙂 is there no way to augment a plugin, derive from some class, etc?
b
I propose that pants starts the search in the directory that holds that BUILD file containing the target.
So AFAICT Pants is including ALL the relevant configs in the sandbox (as some tools do allow the hierarchical config). And the search does start at the relevant targets. What you're seeing isn't Pants' behavior but flake8s.
e
alternatively, would you consider a PR where I add such a "start in target dir" option, and add a config to flake8 so that this behavior is invoked?
b
If you augment flake8, Pants should "just od the right thing" if you configure flake8 to opt-into the new behavior
e
Oh, I see. How can I quickly check? Is there like a "pants lint --shell"? And if what you say is true (about including all config files), can I add a flag to the flake8 plugin in pants that will find "the most specific file" and invoke flake8 with that as an argument?
b
You can inspect the sandbox to verify Pants is putting both configs in the sandbox. Run
./pants --no-process-cleanup lint --only=-flake8 utils/foo.py
It'll log a dir that'll be leaked so that you can inspect the sandbox
e
OR -- should I have a different "script file" / entry-point for my flake which does this?
b
(FWIW we have a somewhat similar use-case at my company, we want to run flake8 using different codes on test files than normal files. I just have an in-repo flake8 plugin which monkeypatches flake8 to read from the config which codes are ignored for tests)
e
omg I thought you were a maintainer, sorry to be taking so much of your time. really appreciate it.
b
oh i am 🙂
e
well still appreciate it!
🙂
by monkeypatch you mean what? isn't there a way to simply wrap it with a .sh script and tell pants to use that?
(and, looking thru the sandbox, I can't find ANY .flake8 file... ?)
I do see my utils/ dir, but .flake8 is not there
b
Thats unexpected. is config discovery on, and you havent specified
config
?
e
let me double check, but that was the situation at a certain point. let me double check plz.
yes indeed, that's the case. config_discovery is on, and nothing specified. here:
Copy code
[GLOBAL]
pants_version = "2.10.0"

backend_packages = [
  "pants.backend.python",
  "pants.backend.python.lint.flake8",
  "pants.backend.python.typecheck.mypy",
  "pants.backend.shell",
  "pants.backend.shell.lint.shfmt",
  "pants.backend.shell.lint.shellcheck",
]

use_deprecated_python_macros = false

[anonymous-telemetry]
enabled = false

[source]
root_patterns = ["/"]

[python]
interpreter_constraints = ["CPython>=3.9"]

[flake8]
config_discovery = true

[coverage-py]
interpreter_constraints = ["CPython>=3.9"]
Just to demonstrate:
Copy code
~/proj/calius_repo $ find . -name .flake8
./.flake8
./utils/.flake8
Copy code
~/proj/calius_repo $ ./pants --no-process-cleanup lint --only=flake8 utils/bisect_utils.py
18:33:02.30 [INFO] Preserving local process execution dir /private/var/folders/mk/_c6h2svs7ngdcc9h55f924wh0000gn/T/process-executionkZpbcd for "Run Flake8 on 1 file."
18:33:02.69 [INFO] Completed: Lint with Flake8 - flake8 succeeded.

✓ flake8 succeeded.
then changing to the sandbox dir:
Copy code
/private/…/T/process-executionkZpbcd $ cd /private/var/folders/mk/_c6h2svs7ngdcc9h55f924wh0000gn/T/process-executionkZpbcd
nitzans-macbook-pro
/private/…/T/process-executionkZpbcd $ find . -name .flake8
./.flake8
So only one dot-flake8 file, and that's the one from the repo root. I ran md5sum.
b
well still appreciate it!
Happy to help 🙂 (It's purely selfish, the more people on Pants the fancier my status as a maintainer is /s) Also, Pants isn't maintained by one company. Most of the maintainers today are Toolchain employees, but that's a product of history. We're onboarding more and more folks that aren't Toolchain (liek myself)
e
👍
b
So only one dot-flake8 file, and that's the one from the repo root. I ran md5sum.
At least Pants is consistent with flake8 behavior
e
Any easy way for me to debug this process, and any possible PR I might want to send your way? e.g. change files in ~/.cache/.../subsystem.py and add prints maybe?
So not really consistent -- you said above you think it's supposed to copy ALL relevant files, and it would be flake8 that "picks" the one.
b
The code I linked above is your breadcrumbs. As for debugging the easiest way I've found is
logger.error(whatever)
. The async nature of the code makes it hard to breakpoint-debug. You can use
./pants_from_sources
to run Pants from a sibling directory on your code. See https://www.pantsbuild.org/v2.11/docs/contributor-overview for all the juicy docs
e
If that were true I could wrap flake8 with a different script and no PR needed. But given this it feels like my path (no pun intended...) is to add some behavior to the code you sent me and/or to the flake8 plugin, no
?
b
So not really consistent -- you said above you think it's supposed to copy ALL relevant files, and it would be flake8 that "picks" the one.
My code goggles aren't crisp. Bad assumption on my part 🙂
is to add some behavior to the code you sent me and/or to the flake8 plugin
That's what I'd do. FWIW Trying to swap the-thing-that-runs flake8 with something else isn't drop-in. You'd be implementing a new linter. See our (unfortunately not-so-up-to-date) docs: https://www.pantsbuild.org/v2.11/docs/plugins-lint-goal
e
Oh okay. I thought this was unexpected behavior. Cool. What would you recommend? Where to add behavior? We've discussed: 1. A whole new plugin of my own 2. Changing behavior in that code you sent (ConfigFileRequest iirc) and adding some flake8-plugin option to opt in 3. Changing flake8-plugin code only, somehow 4. Using a wraper around flake8 with/without any of the above
b
The Pants half of this should be trivial. Feel free to already open an issue 🙂
It's a two-pronged approach: • Fix Pants to put all the relevant configs (also other maintainers might have more thoughts than I do. @hundreds-father-404?) • Fix flake8 to use "the right config" ◦ You could hack this in a local plugin if you don't care about upstreaming ◦ or/and upstream a change
You shouldn't need to opt-in to the "new" Pants behavior. You'd only really need to opt-in to the new flake8 behavior (if it isn't "on" by default. depends on how you implement it)
e
Perfect, I'll have a go at the above once an initial version of my hackish tab completion works. Thanks a lot Joshua, really.
b
❤️
I see the err in my assumption. The globbing behavior takes what's provided literally. In this case that means it'll only match the repo root. Just model your changes after
isort
(which already supports hierarchical config). https://github.com/pantsbuild/pants/blob/e7c6095829304bd180e922fffeac3f8f0e4650f8/src/python/pants/backend/python/lint/isort/subsystem.py#L74
e
Want to see a demo of autocomplete? 🙂
p_demo.mov
👀 1
h
It might be odd for Pants to grab non-root .flake8 files into the sandbox when flake8 itself doesn't support them? If adding custom flake8 support for this functionality, then it might make sense to have a Pants plugin that goes with the flake8 plugin, and does what you want (details on how TBD...), but I'm not sure about having the core Pants flake8 plugin do that directly
b
I think it'd be odd, but ultimately I think that's flake8+plugins decision on what to use. Having pants kneecap the tool because we think we know what should be happening isn't great. But that's just my opinion
h
It feels a bit arbitrary? The casual reader won't understand why we capture those files, since vanilla flake8 ignores them. So one attitude might be: why special-case this plugin over others (e.g., my custom plugin that reads a mycustomplugin.ini config fire)
Unless we tie in more directly to the flake8 plugin
b
Fair. Maybe this ties back to that post about
extra_files
e
@happy-kitchen-89482 and @bitter-ability-32190: I think I might invest a bit of time in this now (hierarchical config discovery, or whatever you'd like to call it). I'm not 100% sure I am following your discussion here, so asking to be sure: if I wanted to do it in a way that could be PR'ed, what is your ultimate conclusion? 1. Copy all config files to sandbox or not? 2. Change existing flake8 plugin? Write a new one? 3. Use a wrapper script? 4. Other? Thanks.
b
I think this issue is how to solve the pants part of it. https://github.com/pantsbuild/pants/issues/15225
Then it's choose your own destiny on how you want to consume. Float the idea by flake8? I think asottle is an active maintainer
e
I keep failing to see why this would be related to flake8. It's a general question, imho. To try and make my point: let's say I have a .flake8 file under utils/, and all I want to do is have flake8, when run by pants, pick it up. My point is this: pants, because it is so "used" to running from the repo root, assumes that if I ran flake8 manually I would run it from the repo root as well. But that's not my workflow: I want pants to run flake8 as if the cwd was "utils/". Everything would just work then. Flake8 would pick up my .flake8 file that's THERE (in utils), and that's it.
Everything else in this discussion (multi files, hierarchical files, plugins, flake8 changes, ...) assumes I want to run flake8 from the repo root. But I don't. Things become simpler (if I'm not mega-wrong here...) to define and reason about if we just treat it as "I want to run my tool from the dir that contains the BUILD that contains the target in question".
Am I wrong?
b
Yes and no? I think it's not an unreasonable consideration, but it kind of cross-cuts pants' machinery and assumptions, and is something you don't see in the larger ecosystem of tools w.r.t. flake8 I.e poetry and tox and native flake8 I think without more convincing all kind of assume you're running from the project root. Also, I think conventionally (as in when enforcing conventions) it's good to take a holistic approach. Why should one slice of the repo look different from another?
Pants has some flexibility, you could plugin your way into this functionality, but I think you'll have to make unfortunate tradeoffs and possible rewrites for something that could be achieved by some simple changes to the tools themselves
e
Actually, I argue exactly the opposite, if I may be so bold. My main argument is the fact that we're talking about monorepo. That mean many teams, sub-parts of the repo, etc. And my argument is not at all specific to flake8. Different teams have different styles (so sure flake8), but also different mypy settings, different isort settings, certainly different pytest and coverage.py settings, etc. Also, anything that's a "gradual adoptions" also introduces different settings into different parts of the repo.
(I don't know how much buck@fb experience counts here, but eg that contributes some to my arguments; but also very simply the company I work at today -- that's exactly what happens there.)
So my argument is to allow different settings in different parts of the repo, and for that add a "run from here" setting for tools, ideally any tool.
b
Bold ideas are good! Unfortunately I'm not imaginative enough to mindmap your idea in concrete code in such a way that you could customize. I may have philosophical differences, but that shouldnt bleed into pants capabilities (it's for everyone) You probably want to wait for the more senior maintainers to add their 2c
e
Perfect, I'm happy to hear folks $0.02, I'm just not sure about the process. What things happens that causes them to weigh in?
b
🪄 @witty-crayon-22786 @hundreds-father-404
(although theyre each "out" for various reasons right now)
The idea will be pervasive to Pants, so it'd likely be deserving of a proposal (an RFC).
e
Is that something one of you guys writes? me?
any link to the process? template?
b
If I wasn't on mobile I'd send some. Slack search for Google docs links. If you need to, filter to Eric.
w
have skimmed through this. i would say that before writing up an “rfc” per-se (which we don’t have an explicit process for, but have some conventions around), i would start by opening a Github issue describing how things work today, and how you would like them to work. i can see the argument in favor of allowing for incremental adoption of this sort (subdirectory has different config), i’m just not sure about the actual mechanism and how that should be accomplished. there is prior art for partitioning by config file (in
scalafmt
… although ironically, i think it natively supports nested config), so that would be an option.
👍 1
b
FWIW I still think more tools should allow subdir config, flake8 included. What if (and it's a scary thought) you aren't using pants? 😅
Now that I think of it mypy allows it, but by configuring the root file to have various settings for various sub packages
h
Just to clarify, ignoring Pants, if you run
flake8
directly in a subdir, it uses the config file in that subdir? So it basically uses whatever is in the cwd?
e
Benjy: yes.
Josh, don't be silly -- why would I not use pants? I don't even 🙂
h
So is this more about 1) being able to run Pants in a subdir - or - 2) being able to run Pants at the root, but have it act like it's running in various subdirs ?
e
It's not running pants in a subdir. That would be nice, but that's not an ask right now. It's a bit more like running pants from root and have it act as if it were running in a subdir. But it's really: run pants from (take your pick), and have it run flake8 from subdir (or at least: as if it were running in a subdir). So I don't care where pants runs from (for this matter), I care about how flake8 behaves. And I don't, as far as I know now, care about where flake8 runs from. I'm fine if pants invoked flake8 like
flake8 --rc=utils/.flake8 utils/foo.py
from root, or if pants invoked
flake8 foo.py
from within utils/. The latter seems better, because pants wouldn't have to know about the specific config files for each tool. And to remind: flake8 is just an example. I also would like to run pytest that way, so that I can configure different minimum-coverage requirements per subdir. So I'm just thinking: have pants be able to run tools from within a subdir.
Does that make sense?
h
It does make sense!
e
Awesome! So what is the process, then? How do we get this moving?
Again, I'm happy to help, but I imagine that, as Josh said, some maintainers need to weigh in maybe? Or do I write something? and similar questions...
b
FWIW for formatting/linting we "batch" the files together for performance, which is something we don't do for tests (tests use one-process-per-file). So the performance of fmt/lint will decrease as your batches will be smaller (more processes = more overhead). Of course this is opt-in so 🤷‍♂️ . If you got the functionality implemented in-tool you wouldn't have to make tradeoffs. (again just FYI)
e
while that sounds right I'm not sure there's anything that can be done about that, in- our out- tool: after all, I do want to run the tool multiple times with different configs. That is what I want. So not sure how that could be improved.
h
I'd like @witty-crayon-22786 to weigh in, but I think the first step is to scope this. In other words, to figure out how generic it is. Is this a flake8-specific thing (at least at first) or is this really a generic mechanism to partition tool runs by config file...
1
e
Well, as I mentioned I want this for flake8 and for pytest atleast. Basically my point is that any linter wants this, as well as any test runner, etc etc. The point again being that different teams in the monorepo have different settings. This, from past experience (fb) as well as current experience (monorepo, 5 years of ~50 people working on, 7 teams, so quite large.)
So I'd say "generic"
@witty-crayon-22786 what says you? (Benjy told me privately that if you don't respond in 1 hour then I can do whatever I want. Really, he did. I think.)
w
I says https://pantsbuild.slack.com/archives/C046T6T9U/p1651353042785009?thread_ts=1651166859.860729&cid=C046T6T9U ... heh. IMO, file for the usecases you actually have for now, and leave generalizing to more tools as more of a footnote. If the implementation is generic then we can expand it.
h
I personally have wanted to have partitioning for flake8. I only don't know how to design the API in an intuitive way that does not make it more complicated when you do not need the partitioning. It is confusing also that we already are partitioning by "resolve" (lockfile) & interpreter constraints
e
Darn, in under 1 hour. I guess I'm bound, then, Well fine, beggars can't be and all that. Bottom line: file an issue?
AND -- once I've done that, if you're willing to entertain PRs -- will someone help me get started? Just point me in the right direction etc?
w
absolutely.
e
Perfect. I'll go look at the docs for how to file an issue. Thanks everyone!
w
one more point here (sorry: doesn’t impact the suggestion to file an issue): we have in general tried to allow for flexibility within subdirectories in order to ease onboarding to a monorepo, but not necessarily with the goal of divergent configuration in the long run. that has meant that in most cases we’re ok (and even prefer) that divergent config be tracked at the root of the repository. for example: we require differing thirdparty resolves to be configured at the root of the repository so that they can be tracked as tech debt
1
being able to
skip_X
on a target by target basis is in line with that (you can skip using the repository’s rules, but not rewrite them)… per-directory configuration goes in another direction.
(and regardless: we already support subdirectory based config to some degree for
pytest
because we have native support for slurping in
conftest.py
files)
e
Well, Josh said I can be bold, so I'll be: I think that's not the cleanest solution (tracking multiple configs in repo root). Here's my favorite example: hierarchical. So the mechanism that will find a tool's config file will start looking in the directory containing the BUILD file containing the target, but if not found will go up one level, and then one more, etc. This way you can have "local overrides". This is certainly not clean with tracking in the root. Another downside is that folks would need different macros of their own for (say) python_sources, python_tests, etc. For no reason other than to specify the config. (Or other macros, but the point is macro per team), where this can be obviated otherwise.
w
hierarchical is great, but doesn’t necessarily need to define the scope of what can be changed. for example, you could configure
skip_X
flags or
resolve=
flags hierarchically, rather than “variables must use snakecase”
(https://github.com/pantsbuild/pants/issues/13767 discusses hierarchical target metadata, although designs had been trending in a slightly different direction)