I have created pex for my EMR serverless. when I u...
# pex
b
I have created pex for my EMR serverless. when I unzip the pex I can see a module inside .deps/jsonschema.. but the script which is importing it is not able find this module. I provide main.py and pex to emr with boto3 from a lambda. I created pex_binary in build. does anyone faced similar issues?
b
The dependencies in a pex are only automatically available if the pex is executed via a "pexy" entrypoint, like
python ./path/to/pex
(if it's a zipapp) or
python ./path/to/pex/__main__.py
(if it's "loose" or "packed"). If it isn't be executed like this, then using the special
import __pex__
import at the very start of the file is what's required. This'll set-up the dependencies for later import statements. I don't think this is well documented. https://github.com/pex-tool/lambdex/blob/main/MIGRATING.md hints at some of this
searching the slack history here for "pyspark emr" finds this blog article too: https://towardsdatascience.com/pex-the-secret-sauce-for-the-perfect-pyspark-deployment-of-aws-emr-workloads-9aef0d8fa3a5 Is that potentially relevant? (NB. I know nothing about the specifics of pyspark or EMR)