Hey everyone I have a question regarding parallelization of Pants #general

Hey everyone! I have a question regarding parallel...

fierce-greece-10087

08/05/2025, 12:18 PM

Hey everyone! I have a question regarding parallelization of generated pytest tests. Let's say I generate tests like this:

Copy code

import pytest


def pytest_generate_tests(metafunc):
    data = zip(range(5), range(5))  # imagine this to be very complex

    metafunc.parametrize("x, y", [pytest.param(x, y, id=f"test_{x}_{y}") for x, y in data])


def test_my_stuff(x: int, y: int):
    assert x <= y # each such test takes about 1 minute to complete

If I run this with

pants

, it will run all generated tests in a single process serially. Can I somehow convince

pants

to run the generated test cases in parallel?

elegant-florist-94385

08/05/2025, 12:26 PM

A few options (I haven't tried these myself): 1. https://www.pantsbuild.org/prerelease/reference/targets/python_tests#xdist_concurrency It still runs in a single process, but parallelizes across your CPUs (its a pytest plugin, so this is more pytest than pants) 2. Can your complex generation be broken into multiple files? ie. something like this?

Copy code

#test_file_A.py
def pytest_generate_tests(metafunc):
    data = zip(range(2), range(5))
    ...

Copy code

#test_file_B.py
def pytest_generate_tests(metafunc):
    data = zip(range(3, 4), range(5))
    ...

etc. This would get pants to run each file as a separate process (you get some separation, but not so fine-grained that you lose performance due to excessive sandbox setup). 3. Test sharding. Usually this is used more for splitting your tests across CI machines, so I'm not sure if you can use it to parallelize on one machine, but a starting point is: https://www.pantsbuild.org/prerelease/reference/goals/test#shard

fierce-greece-10087

08/05/2025, 2:06 PM

Hi Luke. Thanks for the answer! 1. It seems to me that we loose the granular pants caching this way. And also, I'm weary of nesting paralellization mechanisms. 2. We already considered that, and I'm asking this question here to see if something else is possible. 3. The docs say "Useful for splitting large numbers of test files across multiple machines". It seems to me sharding still deals with test files and not with individual test cases. How about writing a custom plugin via pants' python API? So instead of

python_tests

, we'd have some

custom_python_tests

thing, which would take care of that? Would that be possible?

elegant-florist-94385

08/05/2025, 2:13 PM

I actually thought of another idea. Its a variant on 2. Use pants'

parametrize

feature with the

extra_env_vars

field on your

python_tests

, and then you provide a list of different env var values to be used with the test file. Then instead of having multiple files, you can have 1 file that controls the

pytest_generate_tests

mechanism with the env var values. Each parametrized variant will get cached independently, and you have full control over how you want to break down the vars into the individual tests

elegant-florist-94385

08/05/2025, 2:55 PM

The one downside would be that their caches will all be invalidated together, because they all have the same dependency tree, but given the problem, that seems unavoidable

fierce-greece-10087

08/06/2025, 9:42 AM

Hm, how about the custom plugin approach I mentioned? I've never written one, so I'm not sure what is possible and what isn't.

elegant-florist-94385

08/06/2025, 10:56 AM

I think a plugin for this is probably overkill, though it may depend on specific details of the problem. Ultimately, it seems like you want to have a number of different

python_test

targets, so that they can be cached independently. I would guess something like this should work well:

Copy code

# BUILD
test_options = {f"params_{x}_{y}": ("PARAM_X={x}", "PARAM_Y={y})" for x in range(5) for y in range(5)
python_test(
    source="test_file.py",
    extra_env_vars=parametrize(test_options),
)

# test_file.py
import os


def test_my_stuff():
    x = os.getenv("PARAM_X")
    y = os.getenv("PARAM_Y")
    assert x <= y

elegant-florist-94385

08/06/2025, 11:08 AM

I don't have a ton of experience with using plugins, but my general impression is that you would use a plugin more for introducing another type of tool or behavior. For something where you want a more convenient use of something that already exists, you'll want a macro (eg. I use a macro

python_image

to create a

docker_image

target that pre-fills some of the arguments so that I can make all my images use the same base image). Your case is essentially wanting to have a lot of

python_test

targets that that have some common behavior (eg. the same source file), so my first thought was a macro. In your particular case though,

parametrize

is a tool already made for covering this case, when you don't need to do a lot of customization to your targets.

elegant-florist-94385

08/06/2025, 11:09 AM

For reference, I do exactly this (parametrize the

extra_env_vars

field on my tests) to run all of my tests in multiple configurations of feature flags.

fierce-greece-10087

08/11/2025, 8:33 AM

Thanks for the hints! Will try them out and see how it goes.

elegant-florist-94385

08/11/2025, 9:57 AM

Good luck!

2 Views

Open in Slack

Previous Next