Hey everyone! I have a question regarding parallel...
# general
f
Hey everyone! I have a question regarding parallelization of generated pytest tests. Let's say I generate tests like this:
Copy code
import pytest


def pytest_generate_tests(metafunc):
    data = zip(range(5), range(5))  # imagine this to be very complex

    metafunc.parametrize("x, y", [pytest.param(x, y, id=f"test_{x}_{y}") for x, y in data])


def test_my_stuff(x: int, y: int):
    assert x <= y # each such test takes about 1 minute to complete
If I run this with
pants
, it will run all generated tests in a single process serially. Can I somehow convince
pants
to run the generated test cases in parallel?
e
A few options (I haven't tried these myself): 1. https://www.pantsbuild.org/prerelease/reference/targets/python_tests#xdist_concurrency It still runs in a single process, but parallelizes across your CPUs (its a pytest plugin, so this is more pytest than pants) 2. Can your complex generation be broken into multiple files? ie. something like this?
Copy code
#test_file_A.py
def pytest_generate_tests(metafunc):
    data = zip(range(2), range(5))
    ...
Copy code
#test_file_B.py
def pytest_generate_tests(metafunc):
    data = zip(range(3, 4), range(5))
    ...
etc. This would get pants to run each file as a separate process (you get some separation, but not so fine-grained that you lose performance due to excessive sandbox setup). 3. Test sharding. Usually this is used more for splitting your tests across CI machines, so I'm not sure if you can use it to parallelize on one machine, but a starting point is: https://www.pantsbuild.org/prerelease/reference/goals/test#shard
f
Hi Luke. Thanks for the answer! 1. It seems to me that we loose the granular pants caching this way. And also, I'm weary of nesting paralellization mechanisms. 2. We already considered that, and I'm asking this question here to see if something else is possible. 3. The docs say "Useful for splitting large numbers of test files across multiple machines". It seems to me sharding still deals with test files and not with individual test cases. How about writing a custom plugin via pants' python API? So instead of
python_tests
, we'd have some
custom_python_tests
thing, which would take care of that? Would that be possible?
e
I actually thought of another idea. Its a variant on 2. Use pants'
parametrize
feature with the
extra_env_vars
field on your
python_tests
, and then you provide a list of different env var values to be used with the test file. Then instead of having multiple files, you can have 1 file that controls the
pytest_generate_tests
mechanism with the env var values. Each parametrized variant will get cached independently, and you have full control over how you want to break down the vars into the individual tests
The one downside would be that their caches will all be invalidated together, because they all have the same dependency tree, but given the problem, that seems unavoidable
f
Hm, how about the custom plugin approach I mentioned? I've never written one, so I'm not sure what is possible and what isn't.
e
I think a plugin for this is probably overkill, though it may depend on specific details of the problem. Ultimately, it seems like you want to have a number of different
python_test
targets, so that they can be cached independently. I would guess something like this should work well:
Copy code
# BUILD
test_options = {f"params_{x}_{y}": ("PARAM_X={x}", "PARAM_Y={y})" for x in range(5) for y in range(5)
python_test(
    source="test_file.py",
    extra_env_vars=parametrize(test_options),
)

# test_file.py
import os


def test_my_stuff():
    x = os.getenv("PARAM_X")
    y = os.getenv("PARAM_Y")
    assert x <= y
I don't have a ton of experience with using plugins, but my general impression is that you would use a plugin more for introducing another type of tool or behavior. For something where you want a more convenient use of something that already exists, you'll want a macro (eg. I use a macro
python_image
to create a
docker_image
target that pre-fills some of the arguments so that I can make all my images use the same base image). Your case is essentially wanting to have a lot of
python_test
targets that that have some common behavior (eg. the same source file), so my first thought was a macro. In your particular case though,
parametrize
is a tool already made for covering this case, when you don't need to do a lot of customization to your targets.
For reference, I do exactly this (parametrize the
extra_env_vars
field on my tests) to run all of my tests in multiple configurations of feature flags.
f
Thanks for the hints! Will try them out and see how it goes.
e
Good luck!