Hi community, I’m wondering has anyone tried to us...
# general
c
Hi community, I’m wondering has anyone tried to use pants with airflow? Airflow as a shared environment (running multiple data pipelines) would benefit from better dependency management. But there might be architectural changes needed in Airflow in order to execute tasks using Pex binaries.
w
If you checkout github issues and the Slack chat, there are a lot of mentions to airflow that might be useful to you.
e
One thing to note is that modern PEXes have support for a magic
__pex__
package. If you put the PEX on the
sys.path
, say using
PYTHONPATH=my.pex
, then you can have everything work if you just prefix the name of the module you want run with
__pex__
This works for running in AWS Lambda for example with no special support besides re-naming the PEX with a .zip extension and prefixing your existing handle name in the lambda config with
__pex__
:
__<http://pex__.my|pex__.my>_<http://package.my|package.my>_<http://module.my|module.my>_handle_function
For example.
I'll be retiring the Lambdex project in the next few months.
c
Thanks for the input! Do you have experience in managing per-dag dependency using pants? For example, we want to allow DAG1 and DAG2 to depend on two different versions of the same library. We found managing conflicting dependency in a shared environment especially challenging. My initial assessment is that’s not doable without changing airflow-core to make it support pants. But would like to hear the community’s thoughts 🙂
e
I do not. I have never airflowed / DAGed / AIed or MLed.