Hi, I have a Python test that uses Spark, which d...
# general
h
Hi, I have a Python test that uses Spark, which depends on having a JVM. Is there a way to set a dependency on a JVM with a specific version? (for instance,
openjdk 11
)
🧵
r
h
Thanks, will give it a go.
I need it to be able to install a JVM, cache it, and package it when sending it for remote-execution. Will these be possible with this feature?
That is, I want it to have a similar handling to other Python dependencies.
r
I haven’t used it myself. I mostly saw this being recommended here for similar scenarios. This is still in beta.
b
Yeah if everything gets structured correctly, the output of your tool with be cached. And we're cranking performance of that caching in 2.17.x
h
👍 Thanks for confirming.
b
Don't take out word for it though, try it out! We have spark code in our codebase and right now I just turn a blind eye, so I'd love to hear how this works for you
h
Ack. Will probably take a few days before I have a result though.
b
This has been silently broken in our repo for months. Take your time 😉
h
Someone with more JVM knowledge than me should chime in (@witty-crayon-22786, @ancient-vegetable-10556) but I don't think adhoc_tool is pertinent. I'm pretty sure the JVM backends already support this?
a
Let me take a loooksie
There isn’t a very good way to make JVMs present in Python tests. This is a good use case for making runnable dependencies a universal thing
h
Oh, NM, I missed that this was a python test, my bad
a
The best thing you could do is write an
experimental_test_shell_command
that executes the
pytest
runner and the relevant JVM dependencies as a
runnable_dependency
, but it’s not a workflow I have an example of off the top of my head, and it’s likely to be exceeeeeeeeeedingly awkward.
h
I see. Well, I'll think of it for a bit. I imagine maybe making a new
jvm
target that the
python_tests
and/or
python_sources
could depend on and thus to have a JVM available in the relevant contexts (testing, maybe somewhere else?). Any leads on that? (or maybe save me some time by telling me it's not such a great path to go through? 😛)
a
The infrastructure is all there in the form of
runnable_dependencies
, in
adhoc_tool
and
shell_command
, but it’s limited to working with those targets at the moment. We need to work on a more systematic way to add
runnable_dependencies
to rely on
b
One odd thing here too is that in order for this to work,
pyspark
needs to find Java. Meaning the target that represents "Java" should carry not just the files on disk, but an env var (
JAVA_HOME
) associated. I have other (non-Java) use-cases for having the output something also carry env vars for similar reasons. So something we need to figure out is how to make a target not just a digest, but a digest and env vars 😅
a
That’s
runnable_dependencies
🙂
b
I don't follow (also sorry @high-magician-46188 we might be derailing a bit)
a
@bitter-ability-32190 moving your tangent to #development
1