I have a naive question I m trying to run some code on a ray Pants #general

I have a naive question. I'm trying to run some co...

silly-queen-7197

04/14/2023, 11:48 PM

I have a naive question. I'm trying to run some code on a ray cluster. In order to get this to work I need to tell ray a little bit about the packages I use. I need to essentially pass a dict like {"pip": [pip requirements], "env_vars": {...}, "working_dir": "."} to

ray.init(runtime_env=...)

I have an "entrypoint"

item-rank/src/item_rank/publish.py

so I can do something like

./pants dependencies --transitive item-rank/src/item_rank/publish.py

. If I keep all of my requirements in a single

requirements.txt

file I can parse the output and get something like

Copy code

"db-dtypes~=1.0.4",
"gcsfs~=2022.10.0",
"google-cloud-bigquery-storage==2.16.0",

from

Copy code

//:reqs#db-dtypes
//:reqs#gcsfs
//:reqs#google-cloud-bigquery

with a little bit bash golf. How might I translate

Copy code

archipelago/src/archipelago/foo.py
capstan/src/capstan/bar.py

into env vars like

PYTHONPATH=$PYTHONPATH:archipelago/src:capstan/src

e.g. the relevant source roots (I suppose this is just "./pants roots"). I've done this manually and it seems to work (I'm running into permission issues so there are still errors that will take until Monday to resolve, but it seems like this is a viable route) Edit - I never really asked a question, this comment is really more that I'm curious how other folks have approached integrating with ray. I feel like I'm traveling down the wrong path here yet I should be able to query pants about everything I need to construct an environment for my ray workers. A better solution would be if ray understood pex but that doesn't appear to be a supported feature yet

happy-kitchen-89482

04/15/2023, 12:06 AM

As you said,

pants roots

is what you want for the PYTHONPATH

happy-kitchen-89482

04/15/2023, 12:06 AM

and you can get the reqs with something like

happy-kitchen-89482

04/15/2023, 12:07 AM

Copy code

pants dependencies --transitive  path/to/file.py  | \
  xargs pants list --filter-target-type=python_requirement | \
  xargs pants peek | \
  jq .[].requirements

happy-kitchen-89482

04/15/2023, 12:08 AM

The

peek

goal gives you detailed info about each input target

silly-queen-7197

04/15/2023, 12:11 AM

Thanks

silly-queen-7197

04/15/2023, 12:15 AM

This is awesome. I added a little bit to reduce everything into a single array

Copy code

./pants dependencies --transitive item-rank/src/item_rank/publish.py | \
  xargs ./pants list --filter-target-type=python_requirement | \
  xargs ./pants peek | jq '[.[] | .requirements[]] | reduce .[] as $item ([]; . + [$item])'

silly-queen-7197

04/15/2023, 12:20 AM

I wonder if it would make sense to wrap this up as a command and have a

experimental_shell_command

dependency

happy-kitchen-89482

04/15/2023, 12:20 AM

Neat

happy-kitchen-89482

04/15/2023, 12:21 AM

You could if this is something you'll need regularly

happy-kitchen-89482

04/15/2023, 12:22 AM

it would be nice if you could deploy a single Pex to ray instead of having to tell ray how to build your code...

silly-queen-7197

04/15/2023, 12:23 AM

There's https://github.com/ray-project/ray/issues/15518 but I don't think I have the skill / time to contribute to that project

happy-kitchen-89482

04/15/2023, 12:23 AM

The fact that ray, and databricks, and aws lambda, and gcp cloud functions, all want you to provide raw requirement and entry point metadata to deploy python, tells you how immature the python deployment story still is

happy-kitchen-89482

04/15/2023, 12:24 AM

Hah, yes

happy-kitchen-89482

04/15/2023, 12:24 AM

Exactly

silly-queen-7197

04/15/2023, 12:26 AM

To databrick's credit, pyspark supports pex. Unfortunately my team has a huge aversion to spark / pyspark

happy-kitchen-89482

04/15/2023, 3:23 AM

Pyspark does but databricks doesn't, for some reason

gentle-gigabyte-52115

04/17/2023, 7:56 AM

Yusuf thanks for sharing, we also use ray and databricks. We don’t have a solution for ray yet, but we do publish pex files to use with databricks. We use some custom init scripts to inject some code in the python site packages to init the pex environment.

silly-queen-7197

04/17/2023, 10:07 PM

I did a little digging on this. Ray allows you to configure runtime environments via

RuntimeEnvPlugin

. They use this internally for implementing

pip

https://github.com/ray-project/ray/blob/master/python/ray/_private/runtime_env/pip.py#L387,

conda

https://github.com/ray-project/ray/blob/master/python/ray/_private/runtime_env/conda.py#L257 etc

silly-queen-7197

04/17/2023, 10:10 PM

I haven't dug into the code except at a superficial level, but it seems like a

PexPlugin

utilizing the same strategy might be possible.

👀 3

3 Views

Open in Slack

Previous Next