Hi all, I'm having trouble getting isort to work c...
# general
s
Hi all, I'm having trouble getting isort to work correctly with multiple "projects" within a monorepo: isort classifies my python modules as third-party. I posted a question on SO: https://stackoverflow.com/questions/75869900/pants-with-isort-thinks-my-python-modules-are-third-party. Thanks in advance!
p
are u sure isort picks up your isort config file ? We use it this way:
and then have it's config in the pyproject.toml file:
Screenshot 2023-03-28 at 4.43.29 PM.png
maybe try setting https://www.pantsbuild.org/docs/reference-isort#config to point to your config file?
https://www.pantsbuild.org/docs/reference-isort#config_discovery is on by default but might not work correctly for you.
you can also set an invalid option in your isort config and see if it things break when trying to run isort. if not, it is an indication that your config file is not being picked up.
s
setting the config file location didn't work. I don't think this is an isort config problem, because I can use isort successfully without pants. I think it has something to do with how pants invokes isort.
p
use
pants lint --only=isort ::
(for example)
s
do you have your repo setup with multiple top level projects?
p
no. all of our python sources are under
src/python/
so there might be an actual limitation or an issue with pants here if you have multiple top level projects. I am not that familiar with pants and how it works in those cases...
b
This might be https://github.com/pantsbuild/pants/issues/15069: the exact split pants decides to do for running isorts in parallel on different files can change what isort deduces as first party or not.
s
The thing is, it always works when my modules are organized under
src
. but it never works when they're under
project/src
👍 1
b
That’s interesting, although it may still be up to the wiles of pants automatic decision making about file grouping/sandbox contents. I believe it’s deterministic for a fixed repo configuration, but may change as the config/layout changes, so the
project/src
layout may happen to split things in a different way to
src
s
forked from the python example repo. It would be great if we can update the official example repo once we figure this out. Because these are day 1 frustrations that caused me to abandon pants last time.
b
Yeah, we’ve had frustration with pants and isort in our work repo too. I acknowledge the discouragement!
s
I wonder if it has to do with isort's own idiosyncracies: https://github.com/PyCQA/isort/issues/1497
But still, I assume that someone out there knows the magic voodo to make pants and isort work successfully in a multi-project monorepo.
b
Yeah, we still use it, and have a multi-project monorepo. We use the workaround in https://github.com/pantsbuild/pants/issues/15069#issuecomment-1174460553 . Here's the relevant snippet from the
pyproject.toml
Copy code
[tool.isort]
...
# FIXME: <https://github.com/pantsbuild/pants/issues/15069>
# isort and/or pants doesn't seem to reliably identify first party properly (e.g. it'll tell that
# foo is, but not foo.bar, even in a single file!), so just check everything into one
# section... plus this is closer to zimports
default_section = "FIRSTPARTY"
(I'm just looking at your reproducer now, btw)
s
That's funny about the FIRSTPARTY fix, because the pants docs say to set that to THIRDPARTY: https://www.pantsbuild.org/v2.1/docs/python-fmt-goal
The FIRSTPARTY workaround collapses first and third party into one section, which I'd rather not do
b
Weird! We don't set any
known_first_party
like that example, which might be the difference...
The FIRSTPARTY workaround collapses first and third party into one section, which I'd rather not do
Yeah, it's not the best, but we've found the other benefits of pants to be worth it and it's better than not having import sorting at all. 🤷‍♂️ / 😦
Hm, for that reproducer, there's one isort invocation, and it ends up with all the relevant files in the sandbox of that invocation, so 15069 isn't fully related; just a misleading distraction, sorry about that!
s
you're onto something with the known_first_party thing though.
I didn't notice that before. if you remove it, it appears that all the modules are being treated as thirdparty
b
Putting
.isort.cfg
into
project1
(i.e.
mv .isort.cfg project1/.isort.cfg
) seems to give the right behaviour, so I wonder if this is an interaction between config file location and first party inference within isort
s
the isort ticket I linked above does indicate that config location impacts behavior of first/third party.
b
ah, I hadn't read it in enough detail, there we go 👍
s
but the thing that's still confusing is that if you just move the
src
directory up one level everything works correct
but if you rename it to anything other than src (e.g. python) it will stop working
maybe the
project1
aspect to this is just a distraction
the simpler repro is that you just have your source under a folder not named src
b
That suggests to me that the relative position of
.isort.cfg
and
src
is what matters: if they're adjacent, it gives good behaviour; if they're not (including if it's not exactly called
src
), it gives 'bad' behaviour?
s
but I have an existing repo where the source lives under folders named python. so... I need an actual solution to this problem.
unless the config file will work when it's inside the python directory
aaaaand, it's not using that config file. it's just not finding it.
same "correct" behavior if you just delete it
I edited thh config file to be broken no complaints from pants, so it's not being found.
b
When I run
isort
outside of pants, it looks like a broken config file is just a warning and the process succeeds overall:
Copy code
/Users/huon/.pyenv/versions/3.9.10/lib/python3.9/site-packages/isort/settings.py:771: UserWarning: Failed to pull configuration information from /private/tmp/pants-example-python/.isort.cfg
So, pants won't pass this through by default (since it'll only surface true failures).
s
good tip.
b
BTW, once we resolve all of this isort behaviour, that's not caused by pants, unfortunately, I think the only reliable way to get consistent isort behaviour in a large pants repo (where isort might be running in parallel on subsets of the files) is to not rely on inference. In particular, its inference seems to depend on the exact files visible to isort when it runs, and this may not be the full set with pants sandboxing (and may change as files are added/moved/deleted), i.e. https://github.com/pantsbuild/pants/issues/15069 . Two ways to not rely on inference: • the workaround we use: not splitting first and third party 😞 • manually specifying all
known_first_party
modules 😞
s
I can confirm it's finding the config file
ew
well, I confirmed that the config file is being found, by making some edits that triggered it's multi-line formatting behavior
the config file trick only works within that directory. so with two projects, project1 and project2, they each consider each other's code to be third_party. could be desirable to some, but not for us.
b
Ah, yeah, sounds like you're hitting the limits of isorts inference anyway, so manually specifying
known_first_party
might be what you need to do. Just thinking out loud, one option to not need to specify info in multiple places would be to generate the file using info that pants knowns, e.g. add all the subdirectories of each folder in
pants roots
(or something more intelligent). You might want a CI check that the file is up to date, in addition 🤔
That said, it looks like there's an option to customise the
src
behaviour: https://pycqa.github.io/isort/docs/configuration/options.html#src-paths (although 15069 will still cause problems when running within pants even with it set)
h
Re renaming src changing behavior, I'm guessing this is related to the default source roots. What do you see in each case when you run
pants roots
?
Pants defaults to treating
src/
(and the repo root) as a source root
But it sounds like you may need to configure custom source roots, as described here: https://www.pantsbuild.org/docs/source-roots
s
changing the pants roots has no effect
though I haven't tried it since switching to multiple config files
h
I'd still drill into this - you're seeing behavior that relies on a directory specifically named
src
, and the only thing in Pants that knows about that naming convention is the default source roots.
So, putting isort aside for a moment, it is really important for Pants to have an accurate understanding of your source roots, which you provide via the options I linked to above.
pants roots
should show the list of dirs that are roots of your package hierarchy (i.e., the dirs you'd put on sys.path in order to import from them)
Once we're sure the source roots are good, we can drill more specifically into any remaining isort issues
s
yep, the roots are correct
b
I'd still drill into this - you're seeing behavior that relies on a directory specifically named
src
, and the only thing in Pants that knows about that naming convention is the default source roots.
(FWIW, running
isort
outside of pants depends on the
src
name literally too, so there's not (only) a pants interaction here.)
(I filed https://github.com/pantsbuild/pants/issues/18618 about the side observation of an invalid config file being silently ignored by pants)
s
@happy-kitchen-89482 if you take a look at my simple repro repo you'll see the source roots are set correctly: https://github.com/chadrik/pants-example-python
FWIW, we use the same version of isort as pants is using (currently without pants), and we use a single config file, and we do not have this first/third party problem. So this definitely has to do with how pants is invoking isort.
b
Hm, I'm a bit confused. I get the same behaviour in that example repo with external isort vs. pants-run isort?
Copy code
pants export --resolve=isort # (get a venv with the right version of isort)
dist/export/python/virtualenvs/isort/*/bin/isort .
makes the same change to
helloworld/main.py
as
pants fmt ::
s
In a real-world multi-project monorepo you would set isort
src_paths
in the config file to e.g.
project1/src
. It was my assumption that pants was providing the pants source_roots as the isort src_paths, but it seems that it is not. I'm pretty sure that's the right behavior, though.
h
Ah, the repro repo is gold, that is always the best way to get help, so thanks for posting that. I will take a look at it.
s
I'm going to push up a change to remove helloworld from the known_first_party
got a meeting now, so it'll take a few hours
b
AFAIK, Pants doesn't set
src_paths
automatically (similar to how it doesn't set
known_first_party
directly), so I think that'll be why we observed the position of the
.isort.cfg
file and
src
directory name mattering
h
So, in the example repo the issue is that
allthing
isn't in the known_first_party
I think your intuition that Pants could and should set
src_paths=
is correct. The problem is that this won't fix all the issues with isort, e.g., when batching is used or when run incrementally on only changed files. Because then some of the source packages may not be in the sandbox at all, and so
known_first_party
is really the most bullet-proof way to fix this.
s
I don't know anything about pants sandboxing. is it possible to disable, or to ensure that all relevant files are included in the sandbox, for users for whom correctness is more valuable than performance?
h
Pants runs everything in sandboxes, it's what makes caching and concurrency possible. But in this case the issue is also just scale in general - unless you want to run isort on all you code every time, you might be in a situation where it doesn't know about some of your first-party code. So
known_first_party
is probably the best path forward here. Unless you have a large number of first-party top-level packages? Usually people have just one, or at most a handful.
Because they have to manage the top-level namespace without colliding with any existing third-party names...
s
So
known_first_party
is probably the best path forward here. Unless you have a large number of first-party top-level packages?
yep, we have a lot of known_first_party modules. there's a possible solution using pre-commit hooks to write the known_first_party field in the config file. Is there a way to get that list from pants?
h
Hmm, let me think of a better solution then
This has been an ongoing thorn
I have a couple of ideas, will have to play with them to be sure they work...
s
as I said, I'm totally ignorant of pants's sandboxing, but having a set of limited global info that's shared across sandboxes seems valuable for things like this.