Hi community, I’m wondering has anyone tried to use pants with airflow? Airflow as a shared environment (running multiple data pipelines) would benefit from better dependency management. But there might be architectural changes needed in Airflow in order to execute tasks using Pex binaries.
If you checkout github issues and the Slack chat, there are a lot of mentions to airflow that might be useful to you.
One thing to note is that modern PEXes have support for a magic
package. If you put the PEX on the
, say using
, then you can have everything work if you just prefix the name of the module you want run with
This works for running in AWS Lambda for example with no special support besides re-naming the PEX with a .zip extension and prefixing your existing handle name in the lambda config with
For example.
I'll be retiring the Lambdex project in the next few months.
Thanks for the input! Do you have experience in managing per-dag dependency using pants? For example, we want to allow DAG1 and DAG2 to depend on two different versions of the same library. We found managing conflicting dependency in a shared environment especially challenging. My initial assessment is that’s not doable without changing airflow-core to make it support pants. But would like to hear the community’s thoughts 🙂
I do not. I have never airflowed / DAGed / AIed or MLed.