question about batched pytest behavior thread Pants #development

Join Slack

question about batched-pytest behavior :thread:

# development

sparse-lifeguard-95737

10/04/2022, 4:10 PM

question about batched-pytest behavior 🧵

sparse-lifeguard-95737

10/04/2022, 4:11 PM

so, each

python_test

in the same batch must have identical: •

batch_id

•

resolve

•

extra_env_vars

•

xdist_concurrency

• environment

sparse-lifeguard-95737

10/04/2022, 4:11 PM

I’m trying to think of the best UX for cases where two targets have the same

batch_id

but different values in one of the other fields. the options are: • error out with a message telling the user to set different batch IDs where

<conflicting field>

is different • create sub-batches under-the-hood to get groupings with identical values for those fields

sparse-lifeguard-95737

10/04/2022, 4:11 PM

erroring out would be maximally explicit, but probably annoying. i.e. if you have a

__defaults__

setting a

batch_id

near the root of your code to say “batch everything possible”, and then you set a custom

extra_env_vars

on one of your tests, it feels like noise/toil to also have to set a separate

batch_id

on that target (or directory of targets)

sparse-lifeguard-95737

10/04/2022, 4:11 PM

creating sub-batches would make more use-cases Just Work, but it’ll lead to confusing (and therefore annoying) situations where people are really trying to tune their perf and wondering “why the heck will this test not run in a batch???”

sparse-lifeguard-95737

10/04/2022, 4:11 PM

idk how best to weigh the trade-offs between the two - any advice?

bitter-ability-32190

10/04/2022, 4:34 PM

Erroring now and switching the impl later is always a possibility

👍 2

hundreds-father-404

10/04/2022, 4:44 PM

I'm in favor of erroring. I think the sub-batching is too magical, and could have issues if it's important to the user that the tests must all run together

hundreds-father-404

10/04/2022, 4:45 PM

and could have issues if it's important to the user that the tests must all run together

Related, I remain skeptical of Pants splitting up partitions into batches

bitter-ability-32190

10/04/2022, 4:45 PM

Related, I remain skeptical of Pants splitting up partitions into batches

Oof. Can you elaborate 🙂

hundreds-father-404

10/04/2022, 4:48 PM

Specifically for test. It's good for lint and fmt My concern is that test behavior can change depending on what other tests are run together. It's not always safe to split up into smaller batches. It also would not be a transparent implementation for users - how do they change which batch something belongs to? If people want smaller partitions, then they should use more batch IDs

👀 1

sparse-lifeguard-95737

10/04/2022, 4:54 PM

If people want smaller partitions, then they should use more batch IDs

IMO this will be a bad UX - I would find this incredibly frustrating to maintain as a user in our repo I think it would be much better if there was an enforced

batch_size

that could be tweaked as a config toggle. I agree it might be tough for the generic

test

logic to implement that reliably, so I’m currently adding it to the

pytest

subsystem.

sparse-lifeguard-95737

10/04/2022, 4:55 PM

my ideal UX as a user is to set a

batch_id

in a

__defaults__

at the root of the repo, and only need to specify other IDs for “weird” individual cases / subfolders lower in the repo

👀 1

sparse-lifeguard-95737

10/04/2022, 4:58 PM

personally we don’t use separate resolves / extra_env_vars / xdist_concurrency right now so if we ended up having to mark a separate batch ID everywhere those parameters vary it would not be the end of the world for us 🙂 but I suspect other users will be in different situations…maybe

sparse-lifeguard-95737

10/04/2022, 5:09 PM

I feel good about starting with erroring and then introducing more magic later if there’s a lot of demand

👍 1

happy-kitchen-89482

10/04/2022, 5:33 PM

test behavior can change depending on what other tests are run together.

The whole endeavor of batching tests implicitly assumes this is not the case for your tests…

happy-kitchen-89482

10/04/2022, 5:33 PM

If so, what is even correct?

happy-kitchen-89482

10/04/2022, 5:33 PM

Running all tests in one go? Running them each in a separate process?

happy-kitchen-89482

10/04/2022, 5:33 PM

What even defines correctness?

bitter-ability-32190

10/04/2022, 5:34 PM

the thinker

hundreds-father-404

10/04/2022, 5:35 PM

From our docs:

Try running ./pants test :: to see if any tests fail. Sometimes, your tests will fail with Pants even if they pass with your normal setup because tests are more isolated than when running Pytest/unittest directly:

• Test files are isolated from each other. If your tests depended on running in a certain order, they may now fail. This requires rewriting your tests to remove the shared global state.

Right now, we force you to do one process per file. This new feature allows you to instead back out of that, and even run all tests in a single process, like when you use Pytest directly w/o Pants

happy-kitchen-89482

10/04/2022, 5:36 PM

Yes, I understand the context. But allowing that (which we should! people care about performance) assumes that running tests in batches is equally “correct” as running them one at a time

happy-kitchen-89482

10/04/2022, 5:37 PM

We have no strong reason to assume that our current “one process per file” strategy is more correct, particularly when that’s not how tests are typically run outside of Pants

happy-kitchen-89482

10/04/2022, 5:38 PM

Sometimes, your tests will fail with Pants even if they pass with your normal setup

Or vice versa! We are not necessarily stricter, just different.

witty-crayon-22786

10/04/2022, 5:51 PM

We have no strong reason to assume that our current “one process per file” strategy is more correct, particularly when that’s not how tests are typically run outside of Pants

it absolutely is in the context of caching… or, at least: you need to be consistent about whether you’re running a small set or a large set in the presence of caching. it sounds like @sparse-lifeguard-95737’s use case may not require batching within the partitions (yet?), so it feels like a logical step not to implement it in a first version

👍 1

happy-kitchen-89482

10/04/2022, 5:54 PM

The correctness I’m referring to is “quality control correctness”, i.e., is the test achieving its purpose. Or “how much information about the quality of my code can I glean from the fact that the test passed”. Caching is another issue.

happy-kitchen-89482

10/04/2022, 5:55 PM

If a test can pass when run batched one way and fail when batched another way, then it’s not clear which of those is “the truth”, and certainly not clear which was the “right way” to batch. The test itself is, at that point, suspect.

sparse-lifeguard-95737

10/04/2022, 5:57 PM

I think we’re mixing two different questions when we talk about breaking down partitions into sub-batches (or at least they are mixing in my head as I read through this thread): 1. If two targets have the same

batch_id

but conflict in some other field, do we error out? Or auto-resolve the conflict by putting them in different batches under-the-hood despite the

batch_id

? 2. If

targets have the same

batch_id

, do we always run all

in the same batch? or do we allow for a global size-limit toggle so we end up with

sub-batches of `N`/`M` ? 1 was my original question in the thread. I’ve been assuming that the answer to 2 is “yes” but it seems there are differing opinions 🙂

hundreds-father-404

10/04/2022, 5:58 PM

yeah, 1 seems clear to me to error. 2 is where I'm skeptical, but could could be convinced

sparse-lifeguard-95737

10/04/2022, 6:02 PM

as a user I wouldn’t want to have to move/create new `batch_id`s as more tests are added to the repo to keep total batch sizes down, so if there is some practical max limit to the number of tests that can go in one process (i.e. max length of a shell command? is that a thing?) then I think having the size-limit toggle would be needed but if there is no practical limit then I am less opinionated - everything in one batch is how CI currently works so I’d expect it to continue working

sparse-lifeguard-95737

10/04/2022, 6:04 PM

I wonder if there is an alternative name to

batch_id

that better conveys “these are safe to run together if possible” vs. “these are definitely going to run together”

hundreds-father-404

10/04/2022, 6:05 PM

bad field name, but something like

batch_can_be_subpartioned: bool

field?

bitter-ability-32190

10/04/2022, 6:06 PM

Relatedly we want the opposite field for fmt for Terraform. E.g. "don't you dare split this up" But that's not a users field, but a Partition one

👍 1

witty-crayon-22786

10/04/2022, 6:09 PM

Relatedly we want the opposite field for fmt for Terraform. E.g. “don’t you dare split this up”

But that’s not a users field, but a Partition one

that isn’t really a field though: that’s semantics that the rule implementation needs to apply

witty-crayon-22786

10/04/2022, 6:09 PM

yea, coke

bitter-ability-32190

10/04/2022, 6:12 PM

In the end the user field for test and get plumbed through to the same code that terrafmt can use to toggle sub-partion-ability

sparse-lifeguard-95737

10/04/2022, 6:17 PM

something like
batch_can_be_subpartioned: bool
field?

I was thinking an alternate name for

batch_id

instead of a second field. otherwise you have to enforce that everything with the same

batch_id

has the same

batch_can_be_subpartitioned

value…

witty-crayon-22786

10/04/2022, 6:20 PM

agreed.

bitter-ability-32190

10/04/2022, 6:22 PM

So not to backtrack here, but is

batch_id

the best term? Given `fmt`/`lint` have

batch_size

, if

test

follows the field would technically be called

batch_can_be_batched

😬

sparse-lifeguard-95737

10/04/2022, 6:24 PM

what I want is some field with a long-form description of something like:

Copy code

If set, tests in files covered by this target _may_ be tested in the same `pytest` process as other files with the same `<field-name>` value.

sparse-lifeguard-95737

10/04/2022, 6:25 PM

I think

batch_id

is not the best at conveying this and am open to alternate suggestions 🙂

hundreds-father-404

10/04/2022, 6:25 PM

group_id

? lol

bitter-ability-32190

10/04/2022, 6:26 PM

https://www.thesaurus.com/browse/batch

hundreds-father-404

10/04/2022, 6:27 PM

heh

clump_id

happy-kitchen-89482

10/04/2022, 6:37 PM

Yeah, I get the confusion and agree that the questions of “these tests are safe to run together” and “these tests should actually be run together, for optimal performance” are separate

happy-kitchen-89482

10/04/2022, 6:38 PM

At the limit, the entire repo might be one “batch_id” for the purpose of being safe to run together, and we’d still want to run, say, 20 tests per process, to balance caching and concurrency vs process overhead

happy-kitchen-89482

10/04/2022, 6:38 PM

So “batch_id” is indeed a slightly misleading name

sparse-lifeguard-95737

10/04/2022, 6:38 PM

exactly

sparse-lifeguard-95737

10/04/2022, 6:39 PM

I am more worried about the

_id

piece of it than the

batch_

piece - giving something an “ID” feels like you are definitively marking the final grouping

witty-crayon-22786

10/04/2022, 6:39 PM

adding

ing

might suggest that you might not get the batch verbatim:

batching_id

witty-crayon-22786

10/04/2022, 6:39 PM

batching_key

sparse-lifeguard-95737

10/04/2022, 6:39 PM

batching_tag

sparse-lifeguard-95737

10/04/2022, 6:39 PM

🤔

witty-crayon-22786

10/04/2022, 6:39 PM

yea.

sparse-lifeguard-95737

10/04/2022, 6:39 PM

PS I am sorry everyone I had no intention of this devolving into bike-shedding 😂

bitter-ability-32190

10/04/2022, 6:40 PM

I'm happy that we're all in on consistent and intentional terminology

bitter-ability-32190

10/04/2022, 6:41 PM

This makes me think of

group_by

. SO

group_tag

bitter-ability-32190

10/04/2022, 6:41 PM

group_key

bitter-ability-32190

10/04/2022, 6:41 PM

E.g. you could say in English "I want Pants to group these tests together"

🤔 1

bitter-ability-32190

10/04/2022, 6:42 PM

Then there's additionally

group_can_be_subpartitioned

group_can_be_batched

. Honestly bummer the didn't name

batch_size

something else 🙂

sparse-lifeguard-95737

10/04/2022, 6:45 PM

I would like to avoid adding ^^^ and instead take the stance that groups might always be batched, while exposing a

batch_size

toggle that users can play with to make their own trade-offs (similar to how

lint

and

fmt

currently work)

sparse-lifeguard-95737

10/04/2022, 6:47 PM

if users really want to force the groups they’ve selected, they can set

batch_size=<big number>

. but that way Pants has some built-in guardrails against batches so huge they clog up / break some piece of the system

sparse-lifeguard-95737

10/04/2022, 6:47 PM

i.e. the original ticket for batching linters/formatters was filed because Ryan hit an “argument list too long” error: https://github.com/pantsbuild/pants/issues/13462

bitter-ability-32190

10/04/2022, 6:54 PM

I also raised a stink about lint perf

bitter-ability-32190

10/04/2022, 6:54 PM

and batching fixed that 🙂 (shotuout to Stu ❤️ )

sparse-lifeguard-95737

10/04/2022, 7:24 PM

compatibility_tag

? With help text something like “assign

python_test

targets the same

compatibility_tag

to show they are safe to be tested in the same

pytest

process. Pants will attempt to run compatible tests together as long as insert description of caveats _here_”

bitter-ability-32190

10/04/2022, 7:26 PM

LGTM

happy-kitchen-89482

10/04/2022, 7:46 PM

yeah, I like

compatibility

as the concept

happy-kitchen-89482

10/04/2022, 7:46 PM

mutuality

- which is made up, but therefore has the advantage that we can make it mean whatever we want

Open in Slack

Previous Next