https://pantsbuild.org/ logo
#general
Title
# general
q

quiet-evening-25363

01/20/2022, 9:22 PM
this is not a direct pants question, so please let me know if there's a different channel I should post in, but based on slack searches it looks like there may be others with a similar use case using pants that I'd like feedback from if possible! I'm trying to use pants to generate a PEX that's compatible to be used as a packaged pyspark environment, specifically to be run on gcp dataproc. I posted my question to stackoverflow yesterday, but in the meantime - has anyone had luck with this? maybe @freezing-vegetable-92896 or @dazzling-diamond-4749, who I see had comments referring to spark a few months ago
f

freezing-vegetable-92896

01/20/2022, 9:23 PM
I haven’t gotten a chance to try very hard with pex yet, if you learn anything I’d be curious to follow along
d

dazzling-diamond-4749

01/20/2022, 9:23 PM
!! PEX + Pants + Spark = ❤️ The "hack" that have worked for us on databricks is to unpack the PEX into an VENV and point
SPARK_PYTHON
to
<you>/<venv>/bin/python
f

freezing-vegetable-92896

01/20/2022, 9:23 PM
what we have been doing so far is a bit of hack of generating a files target with the python sources and then uploading this as a sources zip
d

dazzling-diamond-4749

01/20/2022, 9:23 PM
You can do the unpacking in an init script
Also, running PEX directly using
spark-submit
doesn't work 😞
f

freezing-vegetable-92896

01/20/2022, 9:25 PM
Interesting Rob. I’m not sure how well that would work our cluster, the standard workflow for us is to just upload a zip of sources and then the cluster machinery handles connecting it
d

dazzling-diamond-4749

01/20/2022, 9:26 PM
another hack we had to use is
Copy code
SPARK_PYTHON=unpacked_venv/bin/python spark-submit driver.py path.to.my.module entry_point
the driver.py
Copy code
import sys
import importlib

if __name__ == "__main__":
    argv = sys.argv
    if len(argv) < 3:
        raise RuntimeError("Usage: driver.py <module> <function_name> [-- <pipeline_args> ...]")
    module = importlib.import_module(argv[1])
    entrypoint = getattr(module, argv[2])
    if len(argv) > 3:
        assert argv[3] == "--", "Please use -- to separate variables passed to entry point"
        # trim out driver args
        sys.argv = argv[3:]
    entrypoint()
👀 1
f

freezing-vegetable-92896

01/20/2022, 9:26 PM
Though I’ve been interested in seeing if it works to create a pex and re-package it into a zip with everything at the root level like our cluster expects
d

dazzling-diamond-4749

01/20/2022, 9:27 PM
And make the init script pull your target pex by env var
We pull pex dynamically, at least DB lets us
e

enough-analyst-54434

01/20/2022, 9:32 PM
@freezing-vegetable-92896 that SO post points directly at the permission bits on the PEX file not being preserved by
gsutil cp
. I have ~0 familiarity with
gsutil
. Has that possibility been ruled out?
f

freezing-vegetable-92896

01/20/2022, 9:33 PM
I don’t use google cloud (we’ve got a custom setup based around apache levy) so I doubt that’s our issue
e

enough-analyst-54434

01/20/2022, 9:34 PM
Ah, sorry I thought that SO post was yours.
Oh, oops!
f

freezing-vegetable-92896

01/20/2022, 9:35 PM
No, I think Megan just noticed that I had mention spark somewhere in the slack history 🙂
👍 1
e

enough-analyst-54434

01/20/2022, 9:35 PM
@quiet-evening-25363 was that SO post yours? If so that SO post points directly at the permission bits on the PEX file not being preserved by 
gsutil cp
. I have ~0 familiarity with 
gsutil
. Has that possibility been ruled out?
Pants / Pex plop out PEXes with 755 perms so that they can be directly executed. If those perms lose the execute bits on a bucket copy / download, then you'd expect the error in the SO post.
q

quiet-evening-25363

01/20/2022, 9:38 PM
yup the stackoverflow is mine, I'm looking into the permissions error at the moment, which seems like a good possibility... but I was not sure if this method should work in general. but this is great info, thank you! @dazzling-diamond-4749 when you say unpack the pex into venv - you mean create a venv, unzip the pex, and then use the python interpreter in the venv?
d

dazzling-diamond-4749

01/20/2022, 9:40 PM
https://pex.readthedocs.io/en/latest/recipes.html?highlight=venv You can include pex tools and the pex will have an
venv
command
q

quiet-evening-25363

01/20/2022, 9:42 PM
ah ok great, thank you! doing this in an init scripts seems to make sense, I'll try it out...
e

enough-analyst-54434

01/20/2022, 9:44 PM
I responded on SO. One way to work around perms is
python my.pex
- you can always run a PEX that way using an explict interpreter. The benefit in your case is the PEX zipapp does not need to be executable, just readable, when run that way.
q

quiet-evening-25363

01/20/2022, 9:47 PM
hmm ok interesting, thanks, I'll keep that in mind as a workaround if there's something bigger with permissions going on
e

enough-analyst-54434

01/20/2022, 9:47 PM
So, for the tools thing:
PEX_TOOLS=1 python ./my.pex venv --rm all --compile create/venv/right/here
👀 1
b

busy-vase-39202

01/21/2022, 5:38 PM
Belatedly I just want to add confirmation that yep this is excellent and on-topic place to ask questions about pex. The pants team (mainly @enough-analyst-54434) maintains it.
👍 1