Hi everyone, I’m evaluating migrating a monorepo t...
# general
Hi everyone, I’m evaluating migrating a monorepo to pants. So far I’ve managed to get everything building and it’s looking great. A lot of services were written under the assumption a single DB connection would be kept/shared amongst all the tests for that service (I’m sure this is not uncommon in python webdev) i.e.
Copy code
Will re-use the same DB. Since pants runs each pytest in a separate process I had to create a unique DB name for each and run any initial migrations xN times. This overhead unfortunately makes the tests slower than they were before, even with parallel execution. Is there’s any mechanism to force pytest to run tests in a single process? Passing
still appears to run the DB setup/teardown once per test file. Whereas ideally I’d like it to run once for the whole suite.
I think this is exactly what I’m looking for ☝️
@dazzling-elephant-33766 that scenario is the exact use-case that motivated the “pytest batching” feature added in Pants 2.15: https://www.pantsbuild.org/v2.15/docs/python-test-goal#batching-tests
2.15 is still in RC but my company has been using it for months without issues FWIW. there were a lot of plugin API changes that might make upgrading painful, though
PS You can have the best of both worlds by running multiple concurrent batches, and using execution_slot_var (https://www.pantsbuild.org/docs/reference-pytest#section-execution-slot-var) to name a database that is unique to the concurrency slot
i upgraded to 2.15 and started using an unique batch_compatibility_tag for tests under a directory. but I still have test failures for db table conflicts when running
./pants test ::
but each test passes when I run them individually like
./pants test file/to/tests/foo.py
Have not tried execution_slot_var yet.
I guess it’s not clear to me whether tests running in the same pytest process is good or not. Initially this field is not set for all tests. so according to the docs “they are run in a dedicated
process”. Does that mean when I run
./pants test ::
since they run in dedicated processes they will be parallelized and result in db conflicts? So basically the best case scenario is if I set them to the same batch_compatibility_tag for all my tests, but even then it isn’t guaranteed they run in the same process tests.
A combination of batching +
is almost certainly what I’m looking for here. So I can run the db/init/migration setup once per service, but test each service against a different DB. For now I’d just like to get the batching working, and worry about execution slots + different DB’s later. I’m on
and I’ve added the following to the
file in the same dir as my
as detailed in https://www.pantsbuild.org/v2.15/docs/python-test-goal#batching-tests
Copy code

__defaults__({(python_test, python_tests): dict(batch_compatibility_tag="your-tag-here"),})
Are there additional steps required here? Pants still appears to be running tests in multiple processes, because half the suite is failing (as the db already exists)
psycopg2.errors.DuplicateDatabase: database "blah" already exists
I’m a bit wary of this snippet:
Compatible tests may not end up in the same
batch if:
• There are “too many” tests with the same
, as determined by the
• Compatible tests have some incompatibility in Pants metadata (i.e. different
perhaps I’m hitting some sort of batch limit, but ideally I’d like this to be unbounded.
@plain-night-51324 that is exactly what execution_slot_var is designed to solve...
By default Pants runs one
process per test file, and those processes will run concurrently. If you use the test batching feature then each
process will run multiple test files, but they will still run concurrently.
And usually this is what you want, for performance.
is what lets you do that without the test colliding on database access
I think you can also set the batch size to be huge enough that in practice you get a single process, but that will harm your performance. You can claw some of it back by using
as the concurrency mechanism, but you'd still get no caching.
@dazzling-elephant-33766 So you can set the batch_size arbitrarily high I guess, and get literally a single pytest process for your entire repo? With the caveats above about performance
Adding this in my
in the root of my repo hasn’t changed the behaviour. I’m still seeing what appears to be separate pytest processes for each test file. I’m invoking the tests with
./pants test services/a/tests:: -- -s
and observing pytest trying to create the same test DB over and over.
Copy code
batch_size = 1000
I’m traveling most of today but I can help debug tomorrow if you’re still hitting problems. At first glance I don’t see an obvious issue
Many thanks for the offer Dan, I’ll continue playing around, no time pressure at all.
So it looks like I needed to put the tag in
instead of
(which is where I have my
) files. Documentation confused me a tad. My test suite for that service is all passing now