Hello. I have a question about the suitability of...
# general
b
Hello. I have a question about the suitability of pants for running a suite of many very small unit tests.
I have a python project that contains many small test files containing small tests, each of which only takes a few seconds to run. I was previously caching these tests using testmon (https://github.com/tarpas/pytest-testmon), which determines fine-grained dependencies using coverage.py. Since my unit tests do not have non-python dependencies, this works well. I am trying pants, and have noticed some significant slowdown in the speed of my test suite. One possible cause may be pants's approach of calling pytest seperately for each test file, meaning that the overhead of starting pytest must be paid for every small test file. I am pleased to see the recent PR: https://github.com/pantsbuild/pants/pull/17385, which allows pants to run multiple test files with a single call to pytest. However, my understanding is that enabling this feature will also make the pants cache less fine-grained and so less useful. Is this correct? If so, are there any plans for adding the ability to retain fine-grained test caching while also making a single call to pytest? Thanks!
h
Hi! Good question. Your understanding is correct - currently, the test batching feature gives you a knob to trade off cache granularity for overhead. Pants caching currently works by caching process executions. The advantage of this is that it's straightforward to reason about correctness, because all process inputs are mixed into the cache key, and the process execution is sandboxed.
Splitting a single process result up into multiple cache entries is controversial - I am cautiously more in favor of it than most, but it is inherently more risky, and would have to be done with great caution.
It would mean decoupling cache entries from processes. I think it is worth doing for the use case you mention, but it is not as easy as it may seem.
That said, there are several other mechanisms that help here. One is to only run tests that are affected by changes: https://www.pantsbuild.org/docs/advanced-target-selection#running-over-changed-files-with---changed-since
(the observation being that if the tests were affected by changes they wouldn't be cached anyway...)
And you can run the remaining tests in a few batches, rather than one giant batch, so you get the concurrency benefits
What are the numbers in your example case: how many tests, how long do they take in each case?
b
I've got 115 unit tests right now, so perhaps not that many compared to some. But there still seems to be a pretty large time gap. Running with python -m pytest test/unit takes 58.3 seconds. Running with ./pants --debug test/unit:: takes 4 minutes and 1 second. I realize that using the --debug option disables caching and parallelism, but I thought it was a good way to get a fair comparison. After all, we could get caching and parallelism outside of pants by using the pytest-testmon and pytest-xdist plugins.
Thanks for the information on the concerns with the safety of splitting process runs into multiple cache entries. I would certainly be in favour of allow the user to declare that the tests in a certain directory have no side effects, so that it would be safe to split a single process result into multiple cache entries when running them. I guess another option would be to add the ability to declare that certain tests should be run with the pytest-testmon plugin. Maybe then the testmon cache could be leveraged in the same way as the mypy cache is leveraged in this PR: https://github.com/pantsbuild/pants/pull/16276.
h
What do the numbers look like without
--debug
?
b
if I run
./pants test --force test/unit::
it takes 1 minute 20 seconds. So concurrency can significantly but not completely mitigate the overhead