this is not a direct pants question, so please let...
# general
this is not a direct pants question, so please let me know if there's a different channel I should post in, but based on slack searches it looks like there may be others with a similar use case using pants that I'd like feedback from if possible! I'm trying to use pants to generate a PEX that's compatible to be used as a packaged pyspark environment, specifically to be run on gcp dataproc. I posted my question to stackoverflow yesterday, but in the meantime - has anyone had luck with this? maybe @freezing-vegetable-92896 or @dazzling-diamond-4749, who I see had comments referring to spark a few months ago
I haven’t gotten a chance to try very hard with pex yet, if you learn anything I’d be curious to follow along
!! PEX + Pants + Spark = ❤️ The "hack" that have worked for us on databricks is to unpack the PEX into an VENV and point
what we have been doing so far is a bit of hack of generating a files target with the python sources and then uploading this as a sources zip
You can do the unpacking in an init script
Also, running PEX directly using
doesn't work 😞
Interesting Rob. I’m not sure how well that would work our cluster, the standard workflow for us is to just upload a zip of sources and then the cluster machinery handles connecting it
another hack we had to use is
Copy code
SPARK_PYTHON=unpacked_venv/bin/python spark-submit entry_point
Copy code
import sys
import importlib

if __name__ == "__main__":
    argv = sys.argv
    if len(argv) < 3:
        raise RuntimeError("Usage: <module> <function_name> [-- <pipeline_args> ...]")
    module = importlib.import_module(argv[1])
    entrypoint = getattr(module, argv[2])
    if len(argv) > 3:
        assert argv[3] == "--", "Please use -- to separate variables passed to entry point"
        # trim out driver args
        sys.argv = argv[3:]
👀 1
Though I’ve been interested in seeing if it works to create a pex and re-package it into a zip with everything at the root level like our cluster expects
And make the init script pull your target pex by env var
We pull pex dynamically, at least DB lets us
@freezing-vegetable-92896 that SO post points directly at the permission bits on the PEX file not being preserved by
gsutil cp
. I have ~0 familiarity with
. Has that possibility been ruled out?
I don’t use google cloud (we’ve got a custom setup based around apache levy) so I doubt that’s our issue
Ah, sorry I thought that SO post was yours.
Oh, oops!
No, I think Megan just noticed that I had mention spark somewhere in the slack history 🙂
👍 1
@quiet-evening-25363 was that SO post yours? If so that SO post points directly at the permission bits on the PEX file not being preserved by 
gsutil cp
. I have ~0 familiarity with 
. Has that possibility been ruled out?
Pants / Pex plop out PEXes with 755 perms so that they can be directly executed. If those perms lose the execute bits on a bucket copy / download, then you'd expect the error in the SO post.
yup the stackoverflow is mine, I'm looking into the permissions error at the moment, which seems like a good possibility... but I was not sure if this method should work in general. but this is great info, thank you! @dazzling-diamond-4749 when you say unpack the pex into venv - you mean create a venv, unzip the pex, and then use the python interpreter in the venv?
d You can include pex tools and the pex will have an
ah ok great, thank you! doing this in an init scripts seems to make sense, I'll try it out...
I responded on SO. One way to work around perms is
python my.pex
- you can always run a PEX that way using an explict interpreter. The benefit in your case is the PEX zipapp does not need to be executable, just readable, when run that way.
hmm ok interesting, thanks, I'll keep that in mind as a workaround if there's something bigger with permissions going on
So, for the tools thing:
PEX_TOOLS=1 python ./my.pex venv --rm all --compile create/venv/right/here
👀 1
Belatedly I just want to add confirmation that yep this is excellent and on-topic place to ask questions about pex. The pants team (mainly @enough-analyst-54434) maintains it.
👍 1