question about batched-pytest behavior :thread:
# development
s
question about batched-pytest behavior 🧵
so, each
python_test
in the same batch must have identical: •
batch_id
•
resolve
•
extra_env_vars
•
xdist_concurrency
• environment
I’m trying to think of the best UX for cases where two targets have the same
batch_id
but different values in one of the other fields. the options are: • error out with a message telling the user to set different batch IDs where
<conflicting field>
is different • create sub-batches under-the-hood to get groupings with identical values for those fields
erroring out would be maximally explicit, but probably annoying. i.e. if you have a
__defaults__
setting a
batch_id
near the root of your code to say “batch everything possible”, and then you set a custom
extra_env_vars
on one of your tests, it feels like noise/toil to also have to set a separate
batch_id
on that target (or directory of targets)
creating sub-batches would make more use-cases Just Work, but it’ll lead to confusing (and therefore annoying) situations where people are really trying to tune their perf and wondering “why the heck will this test not run in a batch???”
idk how best to weigh the trade-offs between the two - any advice?
b
Erroring now and switching the impl later is always a possibility
👍 2
h
I'm in favor of erroring. I think the sub-batching is too magical, and could have issues if it's important to the user that the tests must all run together
and could have issues if it's important to the user that the tests must all run together
Related, I remain skeptical of Pants splitting up partitions into batches
b
Related, I remain skeptical of Pants splitting up partitions into batches
Oof. Can you elaborate 🙂
h
Specifically for test. It's good for lint and fmt My concern is that test behavior can change depending on what other tests are run together. It's not always safe to split up into smaller batches. It also would not be a transparent implementation for users - how do they change which batch something belongs to? If people want smaller partitions, then they should use more batch IDs
👀 1
s
If people want smaller partitions, then they should use more batch IDs
IMO this will be a bad UX - I would find this incredibly frustrating to maintain as a user in our repo I think it would be much better if there was an enforced
batch_size
that could be tweaked as a config toggle. I agree it might be tough for the generic
test
logic to implement that reliably, so I’m currently adding it to the
pytest
subsystem.
my ideal UX as a user is to set a
batch_id
in a
__defaults__
at the root of the repo, and only need to specify other IDs for “weird” individual cases / subfolders lower in the repo
👀 1
personally we don’t use separate resolves / extra_env_vars / xdist_concurrency right now so if we ended up having to mark a separate batch ID everywhere those parameters vary it would not be the end of the world for us 🙂 but I suspect other users will be in different situations…maybe
I feel good about starting with erroring and then introducing more magic later if there’s a lot of demand
👍 1
h
test behavior can change depending on what other tests are run together.
The whole endeavor of batching tests implicitly assumes this is not the case for your tests…
If so, what is even correct?
Running all tests in one go? Running them each in a separate process?
What even defines correctness?
b
the thinker
h
From our docs:
Try running ./pants test :: to see if any tests fail. Sometimes, your tests will fail with Pants even if they pass with your normal setup because tests are more isolated than when running Pytest/unittest directly:
• Test files are isolated from each other. If your tests depended on running in a certain order, they may now fail. This requires rewriting your tests to remove the shared global state.
Right now, we force you to do one process per file. This new feature allows you to instead back out of that, and even run all tests in a single process, like when you use Pytest directly w/o Pants
h
Yes, I understand the context. But allowing that (which we should! people care about performance) assumes that running tests in batches is equally “correct” as running them one at a time
We have no strong reason to assume that our current “one process per file” strategy is more correct, particularly when that’s not how tests are typically run outside of Pants
Sometimes, your tests will fail with Pants even if they pass with your normal setup
Or vice versa! We are not necessarily stricter, just different.
w
We have no strong reason to assume that our current “one process per file” strategy is more correct, particularly when that’s not how tests are typically run outside of Pants
it absolutely is in the context of caching… or, at least: you need to be consistent about whether you’re running a small set or a large set in the presence of caching. it sounds like @sparse-lifeguard-95737’s use case may not require batching within the partitions (yet?), so it feels like a logical step not to implement it in a first version
👍 1
h
The correctness I’m referring to is “quality control correctness”, i.e., is the test achieving its purpose. Or “how much information about the quality of my code can I glean from the fact that the test passed”. Caching is another issue.
If a test can pass when run batched one way and fail when batched another way, then it’s not clear which of those is “the truth”, and certainly not clear which was the “right way” to batch. The test itself is, at that point, suspect.
s
I think we’re mixing two different questions when we talk about breaking down partitions into sub-batches (or at least they are mixing in my head as I read through this thread): 1. If two targets have the same
batch_id
but conflict in some other field, do we error out? Or auto-resolve the conflict by putting them in different batches under-the-hood despite the
batch_id
? 2. If
N
targets have the same
batch_id
, do we always run all
N
in the same batch? or do we allow for a global size-limit toggle so we end up with
M
sub-batches of `N`/`M` ? 1 was my original question in the thread. I’ve been assuming that the answer to 2 is “yes” but it seems there are differing opinions 🙂
h
yeah, 1 seems clear to me to error. 2 is where I'm skeptical, but could could be convinced
s
as a user I wouldn’t want to have to move/create new `batch_id`s as more tests are added to the repo to keep total batch sizes down, so if there is some practical max limit to the number of tests that can go in one process (i.e. max length of a shell command? is that a thing?) then I think having the size-limit toggle would be needed but if there is no practical limit then I am less opinionated - everything in one batch is how CI currently works so I’d expect it to continue working
I wonder if there is an alternative name to
batch_id
that better conveys “these are safe to run together if possible” vs. “these are definitely going to run together”
h
bad field name, but something like
batch_can_be_subpartioned: bool
field?
b
Relatedly we want the opposite field for fmt for Terraform. E.g. "don't you dare split this up" But that's not a users field, but a Partition one
👍 1
w
Relatedly we want the opposite field for fmt for Terraform. E.g. “don’t you dare split this up”
But that’s not a users field, but a Partition one
that isn’t really a field though: that’s semantics that the rule implementation needs to apply
yea, coke
b
In the end the user field for test and get plumbed through to the same code that terrafmt can use to toggle sub-partion-ability
s
something like
batch_can_be_subpartioned: bool
field?
I was thinking an alternate name for
batch_id
instead of a second field. otherwise you have to enforce that everything with the same
batch_id
has the same
batch_can_be_subpartitioned
value…
w
agreed.
b
So not to backtrack here, but is
batch_id
the best term? Given `fmt`/`lint` have
batch_size
, if
test
follows the field would technically be called
batch_can_be_batched
😬
s
what I want is some field with a long-form description of something like:
Copy code
If set, tests in files covered by this target _may_ be tested in the same `pytest` process as other files with the same `<field-name>` value.
I think
batch_id
is not the best at conveying this and am open to alternate suggestions 🙂
h
group_id
? lol
b
h
heh
clump_id
h
Yeah, I get the confusion and agree that the questions of “these tests are safe to run together” and “these tests should actually be run together, for optimal performance” are separate
At the limit, the entire repo might be one “batch_id” for the purpose of being safe to run together, and we’d still want to run, say, 20 tests per process, to balance caching and concurrency vs process overhead
So “batch_id” is indeed a slightly misleading name
s
exactly
I am more worried about the
_id
piece of it than the
batch_
piece - giving something an “ID” feels like you are definitively marking the final grouping
w
adding
ing
might suggest that you might not get the batch verbatim:
batching_id
or
batching_key
s
batching_tag
🤔
w
yea.
s
PS I am sorry everyone I had no intention of this devolving into bike-shedding 😂
b
I'm happy that we're all in on consistent and intentional terminology
This makes me think of
group_by
. SO
group_tag
?
group_key
?
E.g. you could say in English "I want Pants to group these tests together"
🤔 1
Then there's additionally
group_can_be_subpartitioned
or
group_can_be_batched
. Honestly bummer the didn't name
batch_size
something else 🙂
s
I would like to avoid adding ^^^ and instead take the stance that groups might always be batched, while exposing a
batch_size
toggle that users can play with to make their own trade-offs (similar to how
lint
and
fmt
currently work)
if users really want to force the groups they’ve selected, they can set
batch_size=<big number>
. but that way Pants has some built-in guardrails against batches so huge they clog up / break some piece of the system
i.e. the original ticket for batching linters/formatters was filed because Ryan hit an “argument list too long” error: https://github.com/pantsbuild/pants/issues/13462
b
I also raised a stink about lint perf
and batching fixed that 🙂 (shotuout to Stu ❤️ )
s
compatibility_tag
? With help text something like “assign
python_test
targets the same
compatibility_tag
to show they are safe to be tested in the same
pytest
process. Pants will attempt to run compatible tests together as long as insert description of caveats _here_”
b
LGTM
h
yeah, I like
compatibility
as the concept
or
mutuality
- which is made up, but therefore has the advantage that we can make it mean whatever we want