Starting the design for supporting multiple user r...
# development
h
Starting the design for supporting multiple user resolves: a major thrust from the proposal in https://docs.google.com/document/d/1sr1O7W9517w8YbdzfE56xvDb72LafK94Fwxm0juIRFc/edit# is that we have "roots" (tests, binaries/dists) be what chooses the resolve. But, complication that
python_library
can also be a resolve root, particularly with
lint
and
typecheck
. Does that sound right?
@witty-crayon-22786 I feel like we've talked about this already, but I can't remember the outcome nor find it in search. Sorry!
w
yea… i think because it was a bit nebulous.
i think that my expectation was that we’d find the roots depending on that library, and then either build with one of the resolves, or all of the resolves
👍 1
a
How likely do we think a differing resolve actually affecting those goals is?
Yeah, @witty-crayon-22786's suggestion sounds about right, though it is sad that it turns those goals into O(repo) actions rather than O(deps) actions :(
w
yea.
h
How likely do we think a differing resolve actually affecting those goals is?
For MyPy, I could see it being an issue. Imagine your
python_requirement_library
is loose, like
Django
. Your
python_library
depends on that. One "root" uses Django 2 in its lock, whereas another uses Django 3. Which resolve we use could determine whether the type hints are valid or not
w
i think that you don’t need to actually compute deps if the roots of a run are the roots of the build… you still have to scan to find other members of the resolve, but only compute the deps of other members
h
i think that my expectation was that we’d find the roots depending on that library
Okay. A challenge with that is assuming every
python_library
has a root, which might not be try if you just added a new util file w/o tests and no packages /binaries depending on it I remember you mentioning a default resolve, which possibly could cover that
w
(meeting, back in 15)
h
that it turns those goals into O(repo) actions rather than O(deps) actions
Oh, yeah. Oof,
dependees
code is really slow because it has to create a global dependency map 😕
a
Or I guess we could introduce a flag for a resolve-root which we prompt if we can't infer one? It's not a nice experience, though...
1
w
hm. so, another thing i wonder here: how does this interplay with lockfile/resolve invalidation? maybe we can detect based on lockfiles/resolves that there is only 1, or that our requirements are only in one…?
i.e., use the lockfile to detect the universe
h
Yeah I've been thinking about that. cc @ancient-vegetable-10556. Currently lockfile invaldation checks that requested requirements == lockfile requirements. That works great for tool lockfiles And then, for ICs, we now only check that the current run's ICs are compatible to the lockfile's ICs, but not identical For user lockfiles, I think we may need to be checking that the run's requirements are compatible, not necessarily identical
a
Radical idea: Maybe we just refuse to lint things that are specified as target roots but aren't resolve roots? So if you want to lint
lib
, you ask to lint
bin
and it will lint
lib
for you. We could choose to only do this if there are root-aware lints registered (so
black
can lint
lib
fine, but
mypy
can't)
If we're saying "libraries don't make sense to typecheck in isolation", maybe that should be the interface?
h
"libraries don't make sense to typecheck in isolation"
I think they do make sense to lint in isolation.
./pants typecheck src/python/pants/util/strutil.py
is totally sensible. In fact, users submitted a patch so it only checks
strutil.py
whereas we used to check all transitive deps of it too only complication is which lockfile to use to set up the run
Radical idea:
(thank you for thinking like that! I very much appreciate your feedback and suggestions)
a
Can we identify a subset where it's problematic? e.g. is it only problematic if they have direct 3rdparty deps?
h
yeah this is only a problem if your repo uses >1 lockfile, which will be a niche feature that imo we shouldn't activate by default and only relevant if the code uses 3rd party deps. If we go w/ the idea of "find the owning root(s)", then the two interesting cases are: 1) no owning roots 2) multiple results where >1 resolve at play (plus the perf hit of finding owning roots)
w
i think that if no owning root (1), then you’re a member of the “default” resolve. for multiple reasons (boilerplate reduction of needing to declare a resolve on every `pex_binary`/`*_test` being one), we’re going to need a default resolve. while i would love to support a mode for 2) where we lint/typecheck with all relevant resolves, i think it requires multiple Params to Get… and also probably isn’t a good default anyway (linting N times should be opt-in)
For user lockfiles, I think we may need to be checking that the run’s requirements are compatible, not necessarily identical
yea. and more than that, it might actually represent the same sort of potential optimization… where if the transitive requirements of the library are already a member of exact one resolve, you skip scanning the repo
h
(trying to create rn some minimal project setups of varying complexity to capture this problem in code)
Hm...during this exercise, it reminded me there is already a way to express different top-level deps! >1
python_requirement_library
for the same dep, and you use explicit deps instead of dep inference to say what you want This contrived example I had above is a total antipattern and we shouldn't optimize for it:
For MyPy, I could see it being an issue. Imagine your python_requirement_library is loose, like Django. Your python_library depends on that. One "root" uses Django 2 in its lock, whereas another uses Django 3. Which resolve we use could determine whether the type hints are valid or not
In that case, you should have
:django2
and
:django3
targets, and force callers to decide which they support. It is not safe to be relying on the pins differing in the lockfiles
a
Does inference notice that you have a
python_requirement_library
and refuse to infer for you? Or does it pick a default for you?
h
The reason you have to disambiguate if you have
:django2
and
:django3
targets is that we detect ambiguity for the
django
module, so we print a helpful message asking you to explicitly disambiguate by choosing one
w
yes, you should disambiguate
and if that disambiguation is below the library in question, we will see it when we walk its transitive deps
and we will end up with a concrete set of requirements
the concrete set of requirements for the resolve should be a fuzzy key into the lockfile: similar to with the interpreter constraints: if i have an exact match for the subset, then the resolve is still valid for me
(or, that would be the optimization, i think)
but yes: the django2, django3 case is exactly the one that would require multiple user resolves/lockfiles
👍 1
h
How likely do we think a differing resolve actually affecting those goals is?
I think this above insight simplifies the answer. Because you should be modeling different versions via multiple
python_requirement_library
targets— rather than lockfile contents for the same targets having different pins—we can assume any of the N resolves for a target's owning roots would be compatible. No need to run MyPy with all the resolves, we only need one of them If
my_util.py
imports
django
, it will need to have already disambiguated if that's
:django2
vs
:django3
. So, all the owning roots will be using the disambiguated version (sorry, not sure if that makes sense)
w
If 
my_util.py
 imports 
django
, it will need to have already disambiguated if that’s 
:django2
 vs 
:django3
. So, all the owning roots will be using the disambiguated version
they’ll be using some disambiguated version… but not necessarily the same one
oh… you mean that the inner node has to be consistent. um, yes: for anything that it explicitly declares a dependency on. but not necessarily for its transitive dependencies. but that’s a good point… direct deps should all be locked in place.
h
How so? Imagine
:binary1
and
:binary2
both depend on
:my_util
, which has disambiguated it uses
:django2
instead of
:django3
. That means the resolves for both the binaries will be Django 2. So...when we do
./pants typecheck :my_util
, which needs to include Django in the venv, we could use either of the two resolves. Both should have roughly the same version of Django 2. The rest of their deps may differ, but that's fine because we'll extract Django from it using
--repository-pex
(There's a risk the pinned versions will be different if Django 2 top-level requirement floats a lot and the resolves were generated at different times. But, the insight is that that's an antipattern. The top-level dep should be pinned enough that the resolves are compatible)
oh coke I think
Again, the takeaway being:
If a
python_library
has multiple roots where >1 resolve is in play, we can choose any of those resolves. No need to build with every one of them.
w
If a 
python_library
 has multiple roots where >1 resolve is in play, we can choose any of those resolves. No need to build with every one of them.
the point of my “oh… you mean” comment was: that’s only true for direct deps of the library. if the transitive thirdparty deps of the library might matter to whatever task you’re running, then you need to find the right ones.
👍 1
(let me make an example, just to be sure)
h
If we go w/ this "find the roots" approach, we will probably want a new project introspection goal imo (akin to
py-constraints
) that allows you to see what lock would be used when running
./pants typecheck my_util.py
w
you have a library which transitively depends on a thirdparty library which depends on
requests
… ie, requests isn’t declared anywhere in the explicit deps below the library. but the version of requests used will float based on what else is in the relevant resolve
but, i suppose that you could actually use the lockfile(s) to detect whether the reachable dependencies of your library are actually different in the different resolves…?
h
Right. The question is if that matters for a
python_library
in particular, particularly when running Pylint and MyPy on it (the only two actions that can run on a
python_library
and consider 3rd-party deps) For
pex_binary
, of course you want to lock down precisely which version gets used, which you will be setting the
resolve
field. But for the
python_library
, I don't think it actually matters. MyPy and Pylint won't consider the transitive dep in their runs, they're only checking your direct imports
i suppose that you could actually use the lockfile(s) to detect whether the reachable dependencies of your library are actually different in the different resolves
Possibly? I was thinking not to do that. Simply choose one of the N possible resolves compatible with that
python_library
, where compatibility is simply that the
python_library
is used by a root that generated the resolve
w
MyPy and Pylint won’t consider the transitive dep in their runs, they’re only checking your direct imports
do we give mypy the transitive deps, or only the direct?
h
Pylint only considers direct deps. MyPy considers transitive deps—at least it needs them present for first-party code. I was trying an experiment just now where we remove transitive deps from the lockfile to see what MyPy does if you just install the direct deps. But that doesn't work, of course: you can't just install direct deps, you have to install their own deps too.
w
well, if you’re going to actually execute them… mypy is doing static checking, just like pylint. so it will eventually stop walking in its checks
…and if not, that might be a great missed optimization.
h
you can't just install direct deps
I mean that pex doesn't let you install a dep w/o its own deps being present:
Copy code
ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==. These do not:
    attrs>=19.2.0
or do you mean we can use the repository pex extraction feature to just get direct deps? interesting
w
well… however we are doing it for pylint, right?
but yes, it is totally possible to construct a partial pythonpath
and yea, this relates to the
repository.pex
issue: with the graph from the resolve, pants can execute the subsetting itself.
h
oh this convo is confusing because "transitive dep" can mean two things. A transitive dep of the
python_library
as reported by
./pants dependencies --transitive
, vs. a transitive dep of a 3rd-party requirement as defined by its wheel METADATA For Pylint, we only need direct dependencies: what you import directly. When installing direct third-party dependencies, their own deps get pulled in via normal pex/pip mechanisms like lockfiles For MyPy, we need transitive dependencies in the Pants sense. Any of your top-level third-party dependencies—whether direct or transitive deps—need their own deps also installed due to pex/pip
Ah! How about this scheme? Rather than having to find the owning "roots"—which means `./pants dependees`—instead iterate over every lockfile in the repo and check if the current context's requirements are compatible with any of them. If yes, use that. If not, tbd...