Is there a way to add custom rules for dependency inference Pants #general

Is there a way to add custom rules for dependency ...

gentle-flower-25372

04/02/2024, 1:45 AM

Is there a way to add custom rules for dependency inference? I'm using spark's

toPandas()

call which depends on pandas. Although pants doesn't detect it so it doesn't pull it in which leads to failures. I'd like to add a custom rule that add pandas when it sees

toPandas()

in the code.

wide-midnight-78598

04/02/2024, 1:49 AM

Any reason not to just add it as a good ol' fashioned dep?

wide-midnight-78598

04/02/2024, 1:49 AM

Or use

defaults

overrides

to your targets?

gentle-flower-25372

04/02/2024, 1:52 AM

Yeah, I guess that would work. No particular reason other than this is all over the place in our monorepo.

wide-midnight-78598

04/02/2024, 1:55 AM

And there's no reference to it in your app-level code, right? So module_mapping won't help https://www.pantsbuild.org/2.18/reference/targets/python_requirements#module_mapping

wide-midnight-78598

04/02/2024, 1:58 AM

Also, could you put those deps in a resolve that each of these targets use?

wide-midnight-78598

04/02/2024, 2:03 AM

Maybe even multiple resolves? https://www.pantsbuild.org/2.20/reference/subsystems/python#resolves It's funny, because everyone's repo and style of programming/devops is so different, Pants has so many ways to solve/workaround the same/similar problems, it's hard to even keep track of them, other than the specific way each person uses it.

gentle-flower-25372

04/02/2024, 2:05 AM

So I have the dependency explicitly added in pyproject.toml and slurped in via poetry_requirements(), but the issue is because dep inference never finds it, when it builds the pex for testing and run-time, it never includes it 🤣

gentle-flower-25372

04/02/2024, 2:05 AM

I'll just explicitly add the dep to the python_sources() target

broad-processor-92400

04/02/2024, 2:05 AM

One option would be saying that spark always depends on pandas, which might lead to unnecessary deps in some cases, but maybe that's acceptable for you:

Copy code

python_requirements(
   ...,
   overrides={"spark...": dict(dependencies=["path/to#pandas"]},
)

broad-processor-92400

04/02/2024, 2:05 AM

(or

poetry_requirements

)

gentle-flower-25372

04/02/2024, 2:05 AM

interesting... that is a very cool idea.

gentle-flower-25372

04/02/2024, 2:08 AM

This is the target:

Copy code

data/jobs/etl:poetry#pyspark

How would I do the the override so that anything that depends on that automatically slurps in

data/jobs/etl:poetry#pandas

gentle-flower-25372

04/02/2024, 2:09 AM

Is this it?

Copy code

poetry_requirements(
   ...,
   overrides={"pyspark": dict(dependencies=["data/jobs/etl:poetry#pandas"]},
)

broad-processor-92400

04/02/2024, 2:10 AM

Yeah, update your

poetry_requirements(name="poetry", ...)

target in

data/job/etl/BUILD

to something like that.

gentle-flower-25372

04/02/2024, 2:11 AM

ok, cool. thanks @broad-processor-92400

broad-processor-92400

04/02/2024, 2:14 AM

Re the original question: I suspect one could write a dependency inference plugin that can search for text like

toPandas

. If you do want to go down that path (e.g. having pyspark always depend on pandas leads to too many spurious dependencies), https://github.com/pantsbuild/pants/blob/main/src/python/pants/backend/python/framework/django/dependency_inference.py might be a good place to reference, which adds Django-specific dependency inference to Python files.

👍 1

curved-television-6568

04/07/2024, 10:13 PM

Can also do this using module mapping instead of dependencies.. not sure if it will be better/worse/same tho ;) https://pantsbuild.slack.com/archives/C046T6T9U/p1663857284163179

🙌 1

gentle-flower-25372

04/08/2024, 3:01 PM

@curved-television-6568 that's a good one!

🙏 1

2 Views

Open in Slack

Previous Next