https://pantsbuild.org/ logo
c

clever-gigabyte-29368

10/28/2022, 12:52 AM
Hi community, I’m wondering has anyone tried to use pants with airflow? Airflow as a shared environment (running multiple data pipelines) would benefit from better dependency management. But there might be architectural changes needed in Airflow in order to execute tasks using Pex binaries.
w

wide-midnight-78598

10/28/2022, 12:55 AM
If you checkout github issues and the Slack chat, there are a lot of mentions to airflow that might be useful to you.
e

enough-analyst-54434

10/28/2022, 1:09 AM
One thing to note is that modern PEXes have support for a magic
__pex__
package. If you put the PEX on the
sys.path
, say using
PYTHONPATH=my.pex
, then you can have everything work if you just prefix the name of the module you want run with
__pex__
This works for running in AWS Lambda for example with no special support besides re-naming the PEX with a .zip extension and prefixing your existing handle name in the lambda config with
__pex__
:
__<http://pex__.my|pex__.my>_<http://package.my|package.my>_<http://module.my|module.my>_handle_function
For example.
I'll be retiring the Lambdex project in the next few months.
c

clever-gigabyte-29368

11/01/2022, 7:44 PM
Thanks for the input! Do you have experience in managing per-dag dependency using pants? For example, we want to allow DAG1 and DAG2 to depend on two different versions of the same library. We found managing conflicting dependency in a shared environment especially challenging. My initial assessment is that’s not doable without changing airflow-core to make it support pants. But would like to hear the community’s thoughts 🙂
e

enough-analyst-54434

11/01/2022, 8:07 PM
I do not. I have never airflowed / DAGed / AIed or MLed.
3 Views