Is there a Pants plugin for Databricks Integration...
# general
b
Is there a Pants plugin for Databricks Integration? Like: • uploading the whl/jar to databricks libraries • manage the databricks clusters Or could pants community start a project like that?
h
Not yet, but we’d welcome a motivated community member who knows about databricks and wants to look into this, and we’d be happy to help. We’ve had some users ask about this before, and I even reached out to a friend at Databricks to ask if they could add support for PEX files, as that would be a lot simpler than their current Python deployment story. (OSS Spark works with PEX, but commercial Databricks does not for some reason)
I don’t think any of the core maintainers know enough about databricks to have a strong opinion about what this should do, what is useful and so on. But if you could write up a short design doc to start a conversation, that could be a great start
👍 1
b
Apache Spark supports PEX starting from PySpark >= 3.1.1 or PySpark >= 3.2.0
h
Yep! Unfortunately Databricks does not though
b
I’m trying to test if PEX works in Databricks. It will help us save a lot of pip install time.
h
That would be amazing if so, last I looked it was not a thing, but maybe that has changed!
I have reached out to folks I know at Databricks to ask them about supporting PEX, they asked around internally but it didn’t go anywhere
[This was a few months ago]
b
Just got it work on open source pyspark: https://github.com/da-tubi/pants-pyspark-pex Not far away from making it work on Databricks