Hi everyone, I’m evaluating migrating a monorepo t...
# general
d
Hi everyone, I’m evaluating migrating a monorepo to pants. So far I’ve managed to get everything building and it’s looking great. A lot of services were written under the assumption a single DB connection would be kept/shared amongst all the tests for that service (I’m sure this is not uncommon in python webdev) i.e.
Copy code
service/a/test/test_a.py
service/a/test/test_b.py 
service/a/test/test_c.py
Will re-use the same DB. Since pants runs each pytest in a separate process I had to create a unique DB name for each and run any initial migrations xN times. This overhead unfortunately makes the tests slower than they were before, even with parallel execution. Is there’s any mechanism to force pytest to run tests in a single process? Passing
--debug
still appears to run the DB setup/teardown once per test file. Whereas ideally I’d like it to run once for the whole suite.
w
d
I think this is exactly what I’m looking for ☝️
w
🙂
s
@dazzling-elephant-33766 that scenario is the exact use-case that motivated the “pytest batching” feature added in Pants 2.15: https://www.pantsbuild.org/v2.15/docs/python-test-goal#batching-tests
2.15 is still in RC but my company has been using it for months without issues FWIW. there were a lot of plugin API changes that might make upgrading painful, though
h
PS You can have the best of both worlds by running multiple concurrent batches, and using execution_slot_var (https://www.pantsbuild.org/docs/reference-pytest#section-execution-slot-var) to name a database that is unique to the concurrency slot
1
p
i upgraded to 2.15 and started using an unique batch_compatibility_tag for tests under a directory. but I still have test failures for db table conflicts when running
./pants test ::
but each test passes when I run them individually like
./pants test file/to/tests/foo.py
Have not tried execution_slot_var yet.
I guess it’s not clear to me whether tests running in the same pytest process is good or not. Initially this field is not set for all tests. so according to the docs “they are run in a dedicated
pytest
process”. Does that mean when I run
./pants test ::
since they run in dedicated processes they will be parallelized and result in db conflicts? So basically the best case scenario is if I set them to the same batch_compatibility_tag for all my tests, but even then it isn’t guaranteed they run in the same process tests.
d
A combination of batching +
execution_slot_var
is almost certainly what I’m looking for here. So I can run the db/init/migration setup once per service, but test each service against a different DB. For now I’d just like to get the batching working, and worry about execution slots + different DB’s later. I’m on
2.15.0rc1
and I’ve added the following to the
BUILD
file in the same dir as my
tests/conftest.py
as detailed in https://www.pantsbuild.org/v2.15/docs/python-test-goal#batching-tests
Copy code
python_test_utils(
    name="test_utils",
)

__defaults__({(python_test, python_tests): dict(batch_compatibility_tag="your-tag-here"),})
Are there additional steps required here? Pants still appears to be running tests in multiple processes, because half the suite is failing (as the db already exists)
psycopg2.errors.DuplicateDatabase: database "blah" already exists
I’m a bit wary of this snippet:
Compatible tests may not end up in the same
pytest
batch if:
• There are “too many” tests with the same
batch_compatibility_tag
, as determined by the
[test].batch_size
setting.
• Compatible tests have some incompatibility in Pants metadata (i.e. different
resolve
or
extra_env_vars
).
perhaps I’m hitting some sort of batch limit, but ideally I’d like this to be unbounded.
h
@plain-night-51324 that is exactly what execution_slot_var is designed to solve...
By default Pants runs one
pytest
process per test file, and those processes will run concurrently. If you use the test batching feature then each
pytest
process will run multiple test files, but they will still run concurrently.
And usually this is what you want, for performance.
execution_slot_var
is what lets you do that without the test colliding on database access
I think you can also set the batch size to be huge enough that in practice you get a single process, but that will harm your performance. You can claw some of it back by using
pytest-xdist
as the concurrency mechanism, but you'd still get no caching.
@dazzling-elephant-33766 So you can set the batch_size arbitrarily high I guess, and get literally a single pytest process for your entire repo? With the caveats above about performance
d
Adding this in my
pants.toml
in the root of my repo hasn’t changed the behaviour. I’m still seeing what appears to be separate pytest processes for each test file. I’m invoking the tests with
./pants test services/a/tests:: -- -s
and observing pytest trying to create the same test DB over and over.
Copy code
[test]
batch_size = 1000
s
I’m traveling most of today but I can help debug tomorrow if you’re still hitting problems. At first glance I don’t see an obvious issue
d
Many thanks for the offer Dan, I’ll continue playing around, no time pressure at all.
So it looks like I needed to put the tag in
service/a/BUILD
instead of
service/a/test/BUILD
(which is where I have my
contest.py
+
test_abc.py
) files. Documentation confused me a tad. My test suite for that service is all passing now