Is there a way to add custom rules for dependency ...
# general
g
Is there a way to add custom rules for dependency inference? I'm using spark's
toPandas()
call which depends on pandas. Although pants doesn't detect it so it doesn't pull it in which leads to failures. I'd like to add a custom rule that add pandas when it sees
toPandas()
in the code.
w
Any reason not to just add it as a good ol' fashioned dep?
Or use
defaults
or
overrides
to your targets?
g
Yeah, I guess that would work. No particular reason other than this is all over the place in our monorepo.
w
And there's no reference to it in your app-level code, right? So module_mapping won't help https://www.pantsbuild.org/2.18/reference/targets/python_requirements#module_mapping
Also, could you put those deps in a resolve that each of these targets use?
Maybe even multiple resolves? https://www.pantsbuild.org/2.20/reference/subsystems/python#resolves It's funny, because everyone's repo and style of programming/devops is so different, Pants has so many ways to solve/workaround the same/similar problems, it's hard to even keep track of them, other than the specific way each person uses it.
g
So I have the dependency explicitly added in pyproject.toml and slurped in via poetry_requirements(), but the issue is because dep inference never finds it, when it builds the pex for testing and run-time, it never includes it 🤣
I'll just explicitly add the dep to the python_sources() target
b
One option would be saying that spark always depends on pandas, which might lead to unnecessary deps in some cases, but maybe that's acceptable for you:
Copy code
python_requirements(
   ...,
   overrides={"spark...": dict(dependencies=["path/to#pandas"]},
)
(or
poetry_requirements
)
g
interesting... that is a very cool idea.
This is the target:
Copy code
data/jobs/etl:poetry#pyspark
How would I do the the override so that anything that depends on that automatically slurps in
data/jobs/etl:poetry#pandas
?
Is this it?
Copy code
poetry_requirements(
   ...,
   overrides={"pyspark": dict(dependencies=["data/jobs/etl:poetry#pandas"]},
)
b
Yeah, update your
poetry_requirements(name="poetry", ...)
target in
data/job/etl/BUILD
to something like that.
g
ok, cool. thanks @broad-processor-92400
b
Re the original question: I suspect one could write a dependency inference plugin that can search for text like
toPandas
. If you do want to go down that path (e.g. having pyspark always depend on pandas leads to too many spurious dependencies), https://github.com/pantsbuild/pants/blob/main/src/python/pants/backend/python/framework/django/dependency_inference.py might be a good place to reference, which adds Django-specific dependency inference to Python files.
👍 1
c
Can also do this using module mapping instead of dependencies.. not sure if it will be better/worse/same tho ;) https://pantsbuild.slack.com/archives/C046T6T9U/p1663857284163179
🙌 1
g
@curved-television-6568 that's a good one!
🙏 1