rapid-exabyte-76685
02/01/2023, 5:57 AMcurved-manchester-66006
02/01/2023, 3:44 PMrapid-exabyte-76685
02/02/2023, 12:19 AMcrooked-country-1937
02/02/2023, 1:59 AMsrc/python/{dags,airflowplugins,libairflow,libspark,utils…}
. dags
contains dags, libairflow
contains operators, hooks and sensors. It also contains other modules like libspark and so on for other stuff. Tests are in their own top level folder.
2. I sync the whole src/python
to dags
folder on S3. This is specially needed on Airflow v2 as operators/sensors are not supported via plugins. Having them in dags
folder is the recommendation from airflow.
3. Use an airflowignore
file to ensure airflow only scans dags
folder
4. We have 3 resolves. Code in src/python/{dags,libairflow
depend on a airflow-default
resolves, which is py3.7+airflow constraints file + requirements. Similarly we have spark-default
which is py3.9+databricks LTS recommendations. Also have a python-default
for stuff packed into docker containers.
5. Because all the airflow specific stuff is in a airflow-requirements.txt
file, I simply upload it to MWAA.
6. The cool things about pants is that I can use its understanding of transitive dependencies to figure out which airflow DAG was impacted due to a PR, and run full suite of integration tests. This is specially important as testing airflow DAGs is time consuming.
Basically my recommendation is to have all airflow specific requirements into its own resolves. This will ensure you have a requirements.txt
specific for airflow. and pants ensures your airflow code only depends on the library versions compliant with the constraints file.
https://www.pantsbuild.org/docs/python-third-party-dependencies#multiple-lockfiles