Hmm, found an interesting issue in our repository ...
# general
g
Hmm, found an interesting issue in our repository today. We're moving another tool to use Pants. Both this, and another tool already in Pants, use a bunch of shims to authenticate with GCP's IAP. These shims has a bunch of dependencies for auth as well as creating authenticated connections with requests, grpc, GCS and so on. We figured encapsulating all those dependencies inside the shims
pyproject.toml
would be nice, and creating a
python_requirements
for each downstream resolve would make sense. However, the downstream project also requires those projects.
Copy code
Tool A dependencies:
   - auth_utils
   - grpcio

Tool B dependencies:
   - auth_utils
   - requests

auth_utils dependencies:
   - grpcio
   - requests
   - google-auth
The problem is that the resolve for
A
now has two
grpcio
definitions, and
B
has two
requests
definitions! Does anyone have some nice patterns for this? It'd be nice to be able to compose resolves from multiple locations without risking overlap like this.
āœ… 1
āœ… 1
b
I think having a single resolve for shared code is the smooth path. My rule of thumb is that resolve == lock file, and, given that, composing resolves (= composing lock files) seems like a bit of a recipe for confusion, where the two lockfiles have different exact-versions of a dep. However, I understand that that may not work for your use case. An alternative might be making
auth_utils
an explicit
python_distribution
and so, I'm hoping, tool A/B depending on it uses the
install_requires
requirement ranges as input into the lockfile generation process for each of those resolves, rather than creating their own dedicated `python_requirement`s. (I've never personally done this or used
python_distribution
, so don't actually know if it does what you want šŸ˜… I'm just tossing out made-up advice) (For us, we only have multiple resolves for isolated tools, e.g. we install
ariadne-codegen
for some codegen tasks, but that runs as its own PEX, effectively, and just outputs data. So it can be in a separate resolve to our main app code.)
g
When I say composing resolves, I mean "multiple source files to one final resolve". So tool A has a
python_requirements(..., resolve="a")
, tool B has
python_requirements(..., resolve="b")
, and the
auth_utils
has
[python_requirements(..., resolve=r) for r in ("a", "b")]
. But in our case, that leads to ambiguity because we have
Copy code
src/py/tool_a:requirements#grpcio==1.40
src/py/auth_utils:requirements#grpcio==1.40
both in resolve
a
.
Ah, I think importantly (because of ML madness) all our code for
tool_a
is also parametrized over three different resolves in practice (
cpu
,
gpu
, and
base
).
b
Ah. Just brainstorming: Can you drop the duplicated requirement from tool A/B's requirements inputs, and rely on (a) it appearing in
auth_utils
and (b) pants' dependency inference to flag if there's a problem (e.g. drop dep on
auth_utils
but still try to import
grpcio
)?
g
I can! It's just ugly. šŸ˜› Things become non-local and encoded into some "implicit" dependencies of intermediate tools.
l
Not sure if I am of any help, but pondering the situation. If we did the thought experiment that tool A depends on
grpcio==1.39
in its requirements.txt but auth_utils declared a dependency on
grpcio==1.40
for whatever reason, then we expect that it shouldn't work to share a resolve. But the issue is that they separately declare requirements that are compatible with each other, and for which the lock file should logically be able to be generated, (both depending on
grpcio==1.40
), but lock file generation is failing anyway?
g
Ah no, lock file generation works fine. This is during dependency inference where
tool_a/main.py
doesn't know which
grpcio
in the resolve to use.
l
and the lock file has literally two entries for grpcio?
g
This is the issue:
Copy code
14:50:56.84 [WARN] Pants cannot infer owners for the following imports in the target tool_a.py@resolve=cpu:

  * grpc (line: 21)

These imports are not in the resolve used by the target (`cpu`), but they were present in other resolves:

  * grpc: 'base' from //:base#grpcio, 'gpu' from //:gpu#grpcio, 'base' from auth:requirements-base#grpcio, 'gpu' from auth:requirements-gpu#grpcio, 'uploader' from auth:requirements-uploader#grpcio

<snip>
14:50:56.84 [WARN] Pants cannot infer owners for the following imports in the target tool_a.py@resolve=gpu:

  * grpc (line: 21)

These imports are not in the resolve used by the target (`gpu`), but they were present in other resolves:

  * grpc: 'base' from //:base#grpcio, 'cpu' from //:cpu#grpcio, 'base' from auth:requirements-base#grpcio, 'cpu' from auth:requirements-cpu#grpcio, 'uploader' from auth:requirements-uploader#grpcio
You can see when looking at at the first on (
@resolve=cpu
) it finds two
grpcios
for
gpu
, but none for
cpu
. When it's looking for
gpu
, it finds two for
cpu
but none for GPU.
And then it also fails for `protobuf_sources`:
Copy code
AmbiguousPythonCodegenRuntimeLibrary: Multiple `python_requirement` targets were found with the module `grpc` in your project for the resolve 'base', so it is ambiguous which to use for the runtime library for the Python code generated from the the target src/proto/hive:hive_v1_base: ['//:base#grpcio', 'src/py/auth:requirements-base#grpcio']
Those are the same entry in the lockfile in all situations. Just derived from two places.
l
I am wondering if there is a different way to think about the requirements.txt and the resolve. It may be that what Huon is suggesting is the way to go and isn't as ugly as it would be in a regular python project. Let me be a devil's advocate, a devil wearing pants. In a regular non-mono-repo non-pants project we have the following: • requirements.txt • some_module.py The role of the requirements.txt is to define what are all the third party dependencies of the project that need to be installed. It also defines constraints on the versions. some_module.py has no responsibility to define its dependencies and it is going to consume whatever you have otherwise made appear by pip installing requirements.txt But in a pants project maybe it may be more like this: • requirements.txt • BUILD • src/projA/some_module.py • src/projA/BUILD Here, the requirements.txt only defines constraints on versions of 3rd party dependencies. It isn't declaring that such a dependency exists for any particular piece of first-party code. Instead it is the responsibility of
src/projA/BUILD
and
src/projA/some_module.py
(via dep inference) to declare that there is a dependency on a grpcio 3rd party dep. Pants itself will consult these sources to determine which 3rd party requirements need to be made available at runtime (or even at test time). At least I think that's true. Thinking about it the second way, it becomes less strange to omit grpcio from a
src/projA/additional_requirements.txt
if it were to exist, or to even just stick all of them into a single
requirements.txt
at the root.
g
Thanks, yeah. I agree completely, on the theoretical level. This would likely be much easier if we didn't have to deal with the ML CPU/GPU/Mac variations, which awkwardly enforce a parametrization and fixed resolves from the top down. That becomes infective, so shared libraries need to accept all "fixed" resolves elsewhere. Before we did that trio I had no such issues. And so when we add new deps for unrelated tools that isn't using those big ML wheels, everyone pays a 5-minute resolve tax. And
tool_a
requires torch, but
tool_b
doesn't, so I want to be able to iterate on
tool_b
without resolving torch again if I can avoid it.
l
By the way, the warning message is also misleading, right? the grpc input is in the cpu/gpu resolves
g
Yeah, but it doesn't find them. I think because they are ambiguous, they get completely discarded before we even hit that warning.
l
ack
g
Here, specifically: https://github.com/tgolsson/pants/blob/ts%2Fstreaming-logs/src/python/pants/backend/python/dependency_inference/module_mapper.py#L475-L476 I think the problem is that the dependency inference does not look at the lockfile, that only happens once the dependencies are inferred. So a bit of expectation miss... I expected it to resolve dependencies vs the lock, and resolve those back to one or more targets. But the lockfile only comes into play when building the PEX, as far as I can tell.
Ah, https://www.pantsbuild.org/docs/reference-python-infer#ambiguity_resolution allows me to solve that for everything apart from protobuf, at least.
That leaves only the actual error šŸ˜‚
l
šŸ˜› so the warnings are gone, but did you get earlier warnings that would have hinted you to adjust the ambiguity_resolution?
and is it that we need to have the same or equivalent option in codegen
g
There was another earlier warning but did not mention the option. Just to specify explicitly.
l
ah, like to add a
dependencies=[ref_to_the_requirement]
in the build file?
g
Sort of, yeah.
Copy code
14:50:56.62 [WARN] The target src/py/auth/auth/authentication.py@resolve=cpu imports `requests`, but Pants cannot safely infer a dependency because more than one target owns this module, so it is ambiguous which to use: ['//:cpu#requests', 'src/py/auth:requirements-cpu#requests'].

Please explicitly include the dependency you want in the `dependencies` field of src/py/auth/auth/authentication.py@resolve=cpu, or ignore the ones you do not want by prefixing with `!` or `!!` so that one or no targets are left.
l
we should have it mention the flag. I wonder if there is an existing issue
g
I'll have a look, I also think that the flag should already work for the runtime library lookup, likely just not getting set.
l
g
Hah, duh.
l
Oh, right, that feature needs to be enabled.
Try adding this to your pants.toml:
```[python-infer]
ambiguity_resolution = "by_source_root"```
I tried this in a local clone of your repo and it seemed to make things work.
But we should point users to the feature in the warning message. Would you like to take it? Else happy to try and make the pr/issue
g
I'm going to investigate/PR why it doesn't work for the hard error which calls the same code, so can include it there.
šŸ‘ 1
h
Sorry, late to the party and just catching up on this. But I didn't get why you need resolves A and B to begin with? Why is there not one shared resolve with a single pyproject.toml? Since you expect to share at least some of the underlying requirements across the two tools?
In general multiple resolves are for when you must have conflicting deps in the codebase. In this case it sounds like you don't?
g
We have those use-cases as well; but in this specific Tool A vs B situation we have tool A using
torch
, and tool B shouldn't ever have that because torch brings a 10-second pex to multiple minutes of build time.
h
Pants subsets the resolve to just what each binary/test/whatever needs
so just having torch in the lockfile does not imply that it would actually be built into the pex
If you're seeing otherwise then something is wrong
g
Oh no, I know that. This is a guard rail to prevent it from ever occuring by accident by transitive imports.
h
Ah, another way to get that guardrail might be to add a
!!
dependency on torch from the binary?
g
Yeah; can try that. Though
B
would then still pay for
generate-lockfiles
costs for torch, where it can now be regenerated much faster.