Hmm found an interesting issue in our repository today We re Pants #general

Hmm, found an interesting issue in our repository ...

gorgeous-winter-99296

09/26/2023, 11:03 AM

Hmm, found an interesting issue in our repository today. We're moving another tool to use Pants. Both this, and another tool already in Pants, use a bunch of shims to authenticate with GCP's IAP. These shims has a bunch of dependencies for auth as well as creating authenticated connections with requests, grpc, GCS and so on. We figured encapsulating all those dependencies inside the shims

pyproject.toml

would be nice, and creating a

python_requirements

for each downstream resolve would make sense. However, the downstream project also requires those projects.

Copy code

Tool A dependencies:
   - auth_utils
   - grpcio

Tool B dependencies:
   - auth_utils
   - requests

auth_utils dependencies:
   - grpcio
   - requests
   - google-auth

The problem is that the resolve for

now has two

grpcio

definitions, and

has two

requests

definitions! Does anyone have some nice patterns for this? It'd be nice to be able to compose resolves from multiple locations without risking overlap like this.

✅ 1

broad-processor-92400

09/26/2023, 11:17 AM

I think having a single resolve for shared code is the smooth path. My rule of thumb is that resolve == lock file, and, given that, composing resolves (= composing lock files) seems like a bit of a recipe for confusion, where the two lockfiles have different exact-versions of a dep. However, I understand that that may not work for your use case. An alternative might be making

auth_utils

an explicit

python_distribution

and so, I'm hoping, tool A/B depending on it uses the

install_requires

requirement ranges as input into the lockfile generation process for each of those resolves, rather than creating their own dedicated `python_requirement`s. (I've never personally done this or used

python_distribution

, so don't actually know if it does what you want 😅 I'm just tossing out made-up advice) (For us, we only have multiple resolves for isolated tools, e.g. we install

ariadne-codegen

for some codegen tasks, but that runs as its own PEX, effectively, and just outputs data. So it can be in a separate resolve to our main app code.)

gorgeous-winter-99296

09/26/2023, 11:21 AM

When I say composing resolves, I mean "multiple source files to one final resolve". So tool A has a

python_requirements(..., resolve="a")

, tool B has

python_requirements(..., resolve="b")

, and the

auth_utils

has

[python_requirements(..., resolve=r) for r in ("a", "b")]

. But in our case, that leads to ambiguity because we have

Copy code

src/py/tool_a:requirements#grpcio==1.40
src/py/auth_utils:requirements#grpcio==1.40

both in resolve

gorgeous-winter-99296

09/26/2023, 11:26 AM

Ah, I think importantly (because of ML madness) all our code for

tool_a

is also parametrized over three different resolves in practice (

cpu

gpu

, and

base

broad-processor-92400

09/26/2023, 11:30 AM

Ah. Just brainstorming: Can you drop the duplicated requirement from tool A/B's requirements inputs, and rely on (a) it appearing in

auth_utils

and (b) pants' dependency inference to flag if there's a problem (e.g. drop dep on

auth_utils

but still try to import

grpcio

gorgeous-winter-99296

09/26/2023, 11:30 AM

I can! It's just ugly. 😛 Things become non-local and encoded into some "implicit" dependencies of intermediate tools.

late-advantage-75311

09/26/2023, 12:34 PM

Not sure if I am of any help, but pondering the situation. If we did the thought experiment that tool A depends on

grpcio==1.39

in its requirements.txt but auth_utils declared a dependency on

grpcio==1.40

for whatever reason, then we expect that it shouldn't work to share a resolve. But the issue is that they separately declare requirements that are compatible with each other, and for which the lock file should logically be able to be generated, (both depending on

grpcio==1.40

), but lock file generation is failing anyway?

gorgeous-winter-99296

09/26/2023, 12:46 PM

Ah no, lock file generation works fine. This is during dependency inference where

tool_a/main.py

doesn't know which

grpcio

in the resolve to use.

late-advantage-75311

09/26/2023, 12:52 PM

and the lock file has literally two entries for grpcio?

gorgeous-winter-99296

09/26/2023, 12:55 PM

This is the issue:

Copy code

14:50:56.84 [WARN] Pants cannot infer owners for the following imports in the target tool_a.py@resolve=cpu:

  * grpc (line: 21)

These imports are not in the resolve used by the target (`cpu`), but they were present in other resolves:

  * grpc: 'base' from //:base#grpcio, 'gpu' from //:gpu#grpcio, 'base' from auth:requirements-base#grpcio, 'gpu' from auth:requirements-gpu#grpcio, 'uploader' from auth:requirements-uploader#grpcio

<snip>
14:50:56.84 [WARN] Pants cannot infer owners for the following imports in the target tool_a.py@resolve=gpu:

  * grpc (line: 21)

These imports are not in the resolve used by the target (`gpu`), but they were present in other resolves:

  * grpc: 'base' from //:base#grpcio, 'cpu' from //:cpu#grpcio, 'base' from auth:requirements-base#grpcio, 'cpu' from auth:requirements-cpu#grpcio, 'uploader' from auth:requirements-uploader#grpcio

You can see when looking at at the first on (

@resolve=cpu

) it finds two

grpcios

for

gpu

, but none for

cpu

. When it's looking for

gpu

, it finds two for

cpu

but none for GPU.

gorgeous-winter-99296

09/26/2023, 12:56 PM

And then it also fails for `protobuf_sources`:

Copy code

AmbiguousPythonCodegenRuntimeLibrary: Multiple `python_requirement` targets were found with the module `grpc` in your project for the resolve 'base', so it is ambiguous which to use for the runtime library for the Python code generated from the the target src/proto/hive:hive_v1_base: ['//:base#grpcio', 'src/py/auth:requirements-base#grpcio']

gorgeous-winter-99296

09/26/2023, 12:57 PM

Those are the same entry in the lockfile in all situations. Just derived from two places.

late-advantage-75311

09/26/2023, 1:08 PM

I am wondering if there is a different way to think about the requirements.txt and the resolve. It may be that what Huon is suggesting is the way to go and isn't as ugly as it would be in a regular python project. Let me be a devil's advocate, a devil wearing pants. In a regular non-mono-repo non-pants project we have the following: • requirements.txt • some_module.py The role of the requirements.txt is to define what are all the third party dependencies of the project that need to be installed. It also defines constraints on the versions. some_module.py has no responsibility to define its dependencies and it is going to consume whatever you have otherwise made appear by pip installing requirements.txt But in a pants project maybe it may be more like this: • requirements.txt • BUILD • src/projA/some_module.py • src/projA/BUILD Here, the requirements.txt only defines constraints on versions of 3rd party dependencies. It isn't declaring that such a dependency exists for any particular piece of first-party code. Instead it is the responsibility of

src/projA/BUILD

and

src/projA/some_module.py

(via dep inference) to declare that there is a dependency on a grpcio 3rd party dep. Pants itself will consult these sources to determine which 3rd party requirements need to be made available at runtime (or even at test time). At least I think that's true. Thinking about it the second way, it becomes less strange to omit grpcio from a

src/projA/additional_requirements.txt

if it were to exist, or to even just stick all of them into a single

requirements.txt

at the root.

gorgeous-winter-99296

09/26/2023, 1:17 PM

Thanks, yeah. I agree completely, on the theoretical level. This would likely be much easier if we didn't have to deal with the ML CPU/GPU/Mac variations, which awkwardly enforce a parametrization and fixed resolves from the top down. That becomes infective, so shared libraries need to accept all "fixed" resolves elsewhere. Before we did that trio I had no such issues. And so when we add new deps for unrelated tools that isn't using those big ML wheels, everyone pays a 5-minute resolve tax. And

tool_a

requires torch, but

tool_b

doesn't, so I want to be able to iterate on

tool_b

without resolving torch again if I can avoid it.

late-advantage-75311

09/26/2023, 1:26 PM

By the way, the warning message is also misleading, right? the grpc input is in the cpu/gpu resolves

gorgeous-winter-99296

09/26/2023, 1:26 PM

Yeah, but it doesn't find them. I think because they are ambiguous, they get completely discarded before we even hit that warning.

late-advantage-75311

09/26/2023, 1:27 PM

ack

gorgeous-winter-99296

09/26/2023, 1:29 PM

Here, specifically: https://github.com/tgolsson/pants/blob/ts%2Fstreaming-logs/src/python/pants/backend/python/dependency_inference/module_mapper.py#L475-L476 I think the problem is that the dependency inference does not look at the lockfile, that only happens once the dependencies are inferred. So a bit of expectation miss... I expected it to resolve dependencies vs the lock, and resolve those back to one or more targets. But the lockfile only comes into play when building the PEX, as far as I can tell.

gorgeous-winter-99296

09/26/2023, 1:32 PM

Ah, https://www.pantsbuild.org/docs/reference-python-infer#ambiguity_resolution allows me to solve that for everything apart from protobuf, at least.

gorgeous-winter-99296

09/26/2023, 1:32 PM

That leaves only the actual error 😂

late-advantage-75311

09/26/2023, 1:33 PM

😛 so the warnings are gone, but did you get earlier warnings that would have hinted you to adjust the ambiguity_resolution?

late-advantage-75311

09/26/2023, 1:34 PM

and is it that we need to have the same or equivalent option in codegen

gorgeous-winter-99296

09/26/2023, 1:36 PM

There was another earlier warning but did not mention the option. Just to specify explicitly.

late-advantage-75311

09/26/2023, 1:37 PM

ah, like to add a

dependencies=[ref_to_the_requirement]

in the build file?

gorgeous-winter-99296

09/26/2023, 1:38 PM

Sort of, yeah.

Copy code

14:50:56.62 [WARN] The target src/py/auth/auth/authentication.py@resolve=cpu imports `requests`, but Pants cannot safely infer a dependency because more than one target owns this module, so it is ambiguous which to use: ['//:cpu#requests', 'src/py/auth:requirements-cpu#requests'].

Please explicitly include the dependency you want in the `dependencies` field of src/py/auth/auth/authentication.py@resolve=cpu, or ignore the ones you do not want by prefixing with `!` or `!!` so that one or no targets are left.

late-advantage-75311

09/26/2023, 1:39 PM

we should have it mention the flag. I wonder if there is an existing issue

gorgeous-winter-99296

09/26/2023, 1:40 PM

I'll have a look, I also think that the flag should already work for the runtime library lookup, likely just not getting set.

late-advantage-75311

09/26/2023, 1:42 PM

https://github.com/pantsbuild/pants/issues/19521

gorgeous-winter-99296

09/26/2023, 1:42 PM

Hah, duh.

late-advantage-75311

09/26/2023, 1:44 PM

Oh, right, that feature needs to be enabled.

Try adding this to your pants.toml:

```[python-infer]

ambiguity_resolution = "by_source_root"```

I tried this in a local clone of your repo and it seemed to make things work.

But we should point users to the feature in the warning message. Would you like to take it? Else happy to try and make the pr/issue

gorgeous-winter-99296

09/26/2023, 1:45 PM

I'm going to investigate/PR why it doesn't work for the hard error which calls the same code, so can include it there.

👍 1

happy-kitchen-89482

10/03/2023, 4:36 PM

Sorry, late to the party and just catching up on this. But I didn't get why you need resolves A and B to begin with? Why is there not one shared resolve with a single pyproject.toml? Since you expect to share at least some of the underlying requirements across the two tools?

happy-kitchen-89482

10/03/2023, 4:36 PM

In general multiple resolves are for when you must have conflicting deps in the codebase. In this case it sounds like you don't?

gorgeous-winter-99296

10/03/2023, 4:44 PM

We have those use-cases as well; but in this specific Tool A vs B situation we have tool A using

torch

, and tool B shouldn't ever have that because torch brings a 10-second pex to multiple minutes of build time.

happy-kitchen-89482

10/03/2023, 4:47 PM

Pants subsets the resolve to just what each binary/test/whatever needs

happy-kitchen-89482

10/03/2023, 4:47 PM

so just having torch in the lockfile does not imply that it would actually be built into the pex

happy-kitchen-89482

10/03/2023, 4:48 PM

If you're seeing otherwise then something is wrong

gorgeous-winter-99296

10/03/2023, 4:50 PM

Oh no, I know that. This is a guard rail to prevent it from ever occuring by accident by transitive imports.

happy-kitchen-89482

10/03/2023, 4:52 PM

Ah, another way to get that guardrail might be to add a

!!

dependency on torch from the binary?

gorgeous-winter-99296

10/03/2023, 6:18 PM

Yeah; can try that. Though

would then still pay for

generate-lockfiles

costs for torch, where it can now be regenerated much faster.

Open in Slack

Previous Next