https://pantsbuild.org/ logo
b

bored-energy-25252

08/17/2022, 9:58 AM
Is there a Pants plugin for Databricks Integration? Like: • uploading the whl/jar to databricks libraries • manage the databricks clusters Or could pants community start a project like that?
h

happy-kitchen-89482

08/17/2022, 2:53 PM
Not yet, but we’d welcome a motivated community member who knows about databricks and wants to look into this, and we’d be happy to help. We’ve had some users ask about this before, and I even reached out to a friend at Databricks to ask if they could add support for PEX files, as that would be a lot simpler than their current Python deployment story. (OSS Spark works with PEX, but commercial Databricks does not for some reason)
I don’t think any of the core maintainers know enough about databricks to have a strong opinion about what this should do, what is useful and so on. But if you could write up a short design doc to start a conversation, that could be a great start
👍 1
b

bored-energy-25252

10/13/2022, 2:57 AM
Apache Spark supports PEX starting from PySpark >= 3.1.1 or PySpark >= 3.2.0
h

happy-kitchen-89482

10/13/2022, 6:53 AM
Yep! Unfortunately Databricks does not though
b

bored-energy-25252

10/13/2022, 6:54 AM
I’m trying to test if PEX works in Databricks. It will help us save a lot of pip install time.
h

happy-kitchen-89482

10/13/2022, 2:53 PM
That would be amazing if so, last I looked it was not a thing, but maybe that has changed!
I have reached out to folks I know at Databricks to ask them about supporting PEX, they asked around internally but it didn’t go anywhere
[This was a few months ago]
b

bored-energy-25252

11/29/2022, 4:31 PM
Just got it work on open source pyspark: https://github.com/da-tubi/pants-pyspark-pex Not far away from making it work on Databricks
3 Views