sparse-lifeguard-95737
09/20/2022, 7:08 PMpytest
calls you asked for on Friday: https://docs.google.com/document/d/1U0Q43bRod_EeVP4eQpcN36NMlxZ4CpRzTl2gMQ5HHvg/edit?usp=sharingwitty-crayon-22786
09/20/2022, 8:21 PMsparse-lifeguard-95737
09/21/2022, 1:52 PMhundreds-father-404
09/21/2022, 4:22 PMpython_test_batch
with a sources
field. That avoids all the awkwardness with merging extra_env_vars
, summing timeout
I think that's pretty easy to implement alsosparse-lifeguard-95737
09/21/2022, 4:23 PMhundreds-father-404
09/21/2022, 4:23 PMsparse-lifeguard-95737
09/21/2022, 4:24 PMhundreds-father-404
09/21/2022, 4:24 PMthe batch would need to span more than 1 dirās worth of testsThat's fine,
sources
works with subdirs too, including **/*_test.py
sparse-lifeguard-95737
09/21/2022, 4:25 PM./pants tailor
will set up for you, AIUI - and not the convention we typically tell people to adopt šhundreds-father-404
09/21/2022, 4:26 PMyeah but itās not what ./pants tailor will set up for you, AIUI - and not the convention we typically tell people to adoptYeah, we would still be defaulting to 1 target per file because we can't infer intent of what you want your batches to be. But you can manually create a
python_test_batch
target instead, which will cause tailor to not try to add python_tests
for those "owned" sourcessparse-lifeguard-95737
09/21/2022, 4:26 PM--changed-since
only chooses the actually-changed files within the batch for execution (vs. returning the entire batch when any one has changed)1. `./pants tailor` creates a bunch of `python_test` targets
2. You have to manually delete all of those and replace them with a top-level `python_test_batch`
vs.
1. `./pants tailor` creates a bunch of `python_test` targets
2. You manually add a `__defaults__` at a top-level `BUILD` file, but leave the existing targets alone
hundreds-father-404
09/21/2022, 4:28 PMitās also important that --changed-since only chooses the actually-changed files within the batch for execution (vs. returning the entire batch when any one has changed)Yeah, that's the biggest downside of
python_batch_target
imo. We used to have code to for SpecsWithOrigin
, which meant we preserved the specific files users specified to us. That code got deleted when we added file-level targets, python_test
vs python_tests
We would likely to need to add back something like that, which is a bummersparse-lifeguard-95737
09/21/2022, 4:30 PMpython_test_batch
is dealing with hierarchy. if you have python_test_batch(sources=["**/test_*.py"])
at the ~root of your repo, but then you end up wanting to exclude specific subtrees / specific files, you have to both edit the root sources
list to exclude files and add new targest. vs. __defaults__
which (I think/hope) will Just Work with hierarchical settings__defaults__
to set the default batch ID, then a __defaults__
in subdirs that need different batches__defaults__
workshundreds-father-404
09/21/2022, 4:32 PMat the ~root of your repo, but then you end up wanting to exclude specific subtrees / specific files, you have to both edit the root sources list to exclude files and add new targest.Agreed. This right here is precisely why we added
overrides
š it used to be the case that you either changed metadata for every generated target, or had to create a new target and modify the existing one to exclude those old sources__defaults__
overrides things from ancestor dirs. and then targets can override from there> itās also important that --changed-since only chooses the actually-changed files within the batch for execution (vs. returning the entire batch when any one has changed)
We used to have code to for SpecsWithOrigin,ohhhh wait I don't think we need that. We have
SpecsPaths
, which preserves the literal file paths the user specified. I think we could leverage that to filter out which files in the batch target should be used
I can demo this technique for Go, which currently suffers from the same issue. https://github.com/pantsbuild/pants/issues/16945Presumably, tailor would still use python_testsāalthough, it would help onboarding if it used python_test_batch instead.This is an interesting thought experiment. If our focus is on getting things working asap for new Pants users, it would be desirable to use batching from the start with
tailor
. And then over time possibly optimize to not-batchedwitty-crayon-22786
09/21/2022, 5:03 PMbatch_id
might still be further subdivided into batches: labeling files might make it clearer that this is about which files can be batched, rather than about which will definitely be batchedhappy-kitchen-89482
09/21/2022, 5:05 PMsparse-lifeguard-95737
09/22/2022, 2:34 PMbatch_id
+ __defaults__
is the approach. I am eager to get started š Iāll poke around in the code to get some ideas on the impl - @witty-crayon-22786 if you already have some thoughts, can you send them my way?hundreds-father-404
09/22/2022, 4:16 PMpytest_runner.py
to have new types and a new rule that generifies running Pytest. Whereas right now the logic lives inside the TestFieldSet
rule, instead create this new rule that takes args like the file names to specify. It will roughly be a copy-and-paste of the body of the current rule. Then, the TestFieldSet
rule will call ittest.py
is one-field-set to one rule invocation. Ideally we don't need to change core/goals/test.py
here.
What we can do is leverage that the engine deduplicates identical Gets and only runs one of them. If we have 5 individual FieldSets all making the same Process request, then the engine will only run it once and share amongst everything. More precisely, they will be making the same Get
call for your helper rule in the prefactorhappy-kitchen-89482
09/22/2022, 4:26 PMsparse-lifeguard-95737
09/22/2022, 4:27 PMbatch_id
will be a python_test
specific thing or a generic field for all *_test
targets (with some mechanism to exclude it for things like shunit2_test
where it doesnāt make sense)happy-kitchen-89482
09/22/2022, 4:28 PMsparse-lifeguard-95737
09/22/2022, 4:28 PMbatch_id
is generic, I think weāll need a per-target batching strategy - for example xdist_concurrency
will need to be the same for all tests in a pytest batch, but that field doesnāt exist for other test typeshappy-kitchen-89482
09/22/2022, 4:29 PMhundreds-father-404
09/22/2022, 4:31 PMsparse-lifeguard-95737
09/22/2022, 4:33 PMhundreds-father-404
09/22/2022, 4:35 PMcore/goals/test.py
will be a lot harder to pull off, vs the strategy I outlined above. But maybe necessary depending on what we want to do with test sharding
Although I suspect it will be possible to only change core/goals/test.py
to consider batch IDs when sharding, without needing to change anything else in the file. And then all the rest of the changes are in pytest_runner.py
How does this interact with lockfiles? I assume batches also cannot span multiple lockfiles?@sparse-lifeguard-95737 eager error if this is violated? I think that probably makes more sense than silently ignoring the batch and un-batching them
sparse-lifeguard-95737
09/22/2022, 4:39 PMbatch_id
field to Python for now. high-level options that occur to me:
1. change all the existing types in the existing test
flow to use batches. subsystems that donāt use batching will just use single-element collections everywhere
2. add a parallel type tree + goals for batched test requests / responses. subsystems register rules for either the single-file or batched types depending on their capabilities. the top-level core/goals/test.py
sends requests for both single-element and batched field sets
3. isolate the changes to the python
subsystem, somehow reconstruct batches from the single-element requests currently send by core/goals/test.py
I am not sure if 3 is feasible (were you saying you have an idea for how it could work @hundreds-father-404?). 1 is nice in theory but potentially a pain for subsystem developers if it turns out most things arenāt batchable. 2 is complex but (hopefully?) makes it easy for subsystems to only implement what they needhundreds-father-404
09/22/2022, 4:40 PMwitty-crayon-22786
09/22/2022, 4:47 PM@witty-crayon-22786 if you already have some thoughts, can you send them my way?will do by EOD: thanks!
hundreds-father-404
09/22/2022, 5:04 PMpython_test
target for batched runs, those targets are going to match against the current PythonTestFieldSet
we have, but we need to run those with the batched runner and not individaul runner
I think the rough implementation would look like:
1. Keep our current PythonTestFieldSet and the TargetRootsToFieldSets
line in test.py
2. But then filter the results into two sets: the individual runner vs batch runner
3. call the two distinct APIs, Get(TestResult, TestFieldSet)
and Get(TestResult, TestBatch)
(or return TestBatchResult
maybe)witty-crayon-22786
09/22/2022, 5:08 PMlint
/ fmt
: iād love for it to be declarative.sparse-lifeguard-95737
09/22/2022, 5:11 PMbitter-ability-32190
09/22/2022, 5:11 PMsparse-lifeguard-95737
09/22/2022, 5:12 PMbitter-ability-32190
09/22/2022, 5:13 PMwitty-crayon-22786
09/22/2022, 5:14 PMwill do by EOD: thanks!correction: Eric will do this on whichever timeline you agree to! thanks.