I'm working on a new section of our codebase and I...
# general
h
I'm working on a new section of our codebase and I'm seeing test targets take a really long time to complete. When I run them individually, they're taking around 5 seconds for pants to do its thing and running concurrently I'm seeing them take upwards of 15 to 20 seconds. The actual tests that run are really simple and only take 0.5 second at most (not accounting for some initial setup that runs in conftest that take around 1.5 seconds). Any ideas why I might be seeing such a large discrepancy?
This is a separate repo that I'm fitting with Pants before merging into our monorepo. The major differences I've seen are that we had
--cov --no-cov-on-fail
in our pytest.ini
addopts
section in this repo and that we make heavy use of
unittest.mock
with their create_autospec method (this is critical to help us mock away hardware dependencies that this repo is normally used for).
Here's a good example of the discrepancy
Copy code
(ops-venv) ~/devel/operations (pants)$ ./pants test --force ops/utils/wheel_utils_test.py 
19:22:38.71 [INFO] Completed: Run Pytest - ops/utils/wheel_utils_test.py:tests succeeded.

✓ ops/utils/wheel_utils_test.py:tests succeeded in 4.55s.

Wrote test XML to `dist/test_results`
(ops-venv) ~/devel/operations (pants)$ pytest ops/utils/wheel_utils_test.py 
================================================================================================================================================ test session starts ================================================================================================================================================
platform linux -- Python 3.8.10, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
rootdir: /home/nathanael/devel/operations, inifile: pytest.ini
plugins: typeguard-2.12.1, xdist-1.34.0, timeout-2.1.0, forked-1.3.0, cov-3.0.0
timeout: 1.0s
timeout method: signal
timeout func_only: False
collected 3 items                                                                                                                                                                                                                                                                                                   

ops/utils/wheel_utils_test.py ...                                                                                                                                                                                                                                                                             [100%]

================================================================================================================================================= 3 passed in 0.32s =================================================================================================================================================
10x the execution time definitely seems like something is up
h
And this is consistent if you re-run a few times with
--force
?
h
I wonder if you're experiencing contention. Might be helpful to play with these settings: https://www.pantsbuild.org/docs/using-pants-in-ci#tuning-resource-consumption-advanced
h
Yeah, seems pretty consistent.
Contention seems surprising when I'm running one thing in isolation
h
What's the perf like when using
test --debug
, which runs in the foreground as an interactive process?
Contention seems surprising when I'm running one thing in isolation
Yeah, wouldn't explain the 5 seconds vs. 0.5 seconds. But might explain the 5 seconds vs 15-20
h
Okay, actually, the initial numbers were misleading. Here's a better example (I was doing some tricks in
conftest.py
to make the times pytest reports lower than they actually are).
Copy code
(ops-venv) ~/devel/operations (pants)$ ./pants test --force --output=all ops/utils/wheel_utils_test.py --pytest-args="--durations=3"
20:05:05.79 [INFO] Initializing scheduler...
20:05:05.90 [INFO] Scheduler initialized.
20:05:12.61 [INFO] Completed: Run Pytest - ops/utils/wheel_utils_test.py:tests succeeded.
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /tmp/process-executioneNcGLY, configfile: pytest.ini
plugins: typeguard-2.12.1, timeout-2.1.0, cov-3.0.0
timeout: 4.0s
timeout method: signal
timeout func_only: False
collected 3 items

ops/utils/wheel_utils_test.py ...                                        [100%]

=============================== warnings summary ===============================
../../home/nathanael/.cache/pants/named_caches/pex_root/venvs/s/08108cc5/venv/lib/python3.8/site-packages/marshmallow/__init__.py:17
  /home/nathanael/.cache/pants/named_caches/pex_root/venvs/s/08108cc5/venv/lib/python3.8/site-packages/marshmallow/__init__.py:17: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    __version_info__ = tuple(LooseVersion(__version__).version)

-- Docs: <https://docs.pytest.org/en/stable/warnings.html>
- generated xml file: /tmp/process-executioneNcGLY/ops.utils.wheel_utils_test.py.tests.xml -
============================= slowest 3 durations ==============================
1.61s call     ops/utils/wheel_utils_test.py::CommandSpeedsFromList::test_commanding_speeds_from_list
0.08s call     ops/utils/wheel_utils_test.py::SetSpeedAndStabilizeTests::test_setting_speed_and_stabilize
0.08s call     ops/utils/wheel_utils_test.py::WaitForWheelSpeedStabilized::test_waiting_to_stabilize
========================= 3 passed, 1 warning in 3.74s =========================



✓ ops/utils/wheel_utils_test.py:tests succeeded in 4.51s.

Wrote test XML to `dist/test_results`
(ops-venv) ~/devel/operations (pants)$ pytest ops/utils/wheel_utils_test.py --durations=3
================================================================================================================================================ test session starts ================================================================================================================================================
platform linux -- Python 3.8.10, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
rootdir: /home/nathanael/devel/operations, inifile: pytest.ini
plugins: typeguard-2.12.1, xdist-1.34.0, timeout-2.1.0, forked-1.3.0, cov-3.0.0
timeout: 4.0s
timeout method: signal
timeout func_only: False
collected 3 items                                                                                                                                                                                                                                                                                                   

ops/utils/wheel_utils_test.py ...                                                                                                                                                                                                                                                                             [100%]

============================================================================================================================================= slowest 3 test durations ==============================================================================================================================================
1.49s call     ops/utils/wheel_utils_test.py::CommandSpeedsFromList::test_commanding_speeds_from_list
0.09s call     ops/utils/wheel_utils_test.py::WaitForWheelSpeedStabilized::test_waiting_to_stabilize
0.07s call     ops/utils/wheel_utils_test.py::SetSpeedAndStabilizeTests::test_setting_speed_and_stabilize
================================================================================================================================================= 3 passed in 1.71s =================================================================================================================================================
So what's odd is that it seems like the test run time is actually quite similar. I'm just seeing the actual pants goal take around 5 seconds to complete.
I'm noticing that when I do
dependencies --transitive
, I see my other repo that's pulled in as a dependency as just
//:astranis-python
. So maybe it's just taking a really long time to bundle/package up the test?
Because we're actually pulling in a really large set of code (it's an entire portion of our monorepo). If that's the case, the good news is that this goes away when we merge this repo into our monorepo.
h
Are you pulling it in as a third-party req, it sounds like? Any other particularly large third-party deps, like Tensorflow or Numpy? That's actually something we should indeed try to slip into Pants 2.10 @witty-crayon-22786 - we recently gained the infrastructure (thanks to @fast-nail-55400) to safely symlink immutable files like requirements PEXes, rather than copying. It has a non-trivial performance impact for JVM because it makes sandbox creation faster. but we haven't had the time to hook up to Python yet. Stu or Tom, would either of you have time to hook up PEX to immutable caches before the release? Reminder we decided to not block that change on the refactor to use
immutable_paths
API
h
Even if that's an issue, it should all be cached
If you're running the same test repeatedly with
--force
and seeing significant overhead every time, that's something we'd want to look into.
The dep on
//:astranis-python
looks like it's on a wheel you've published from that other repo?
Are you seeing that
Copy code
20:05:05.79 [INFO] Initializing scheduler...
20:05:05.90 [INFO] Scheduler initialized.
on every run?
If the scheduler has to restart then the overhead makes sense (although I'd still like to reduce it even in that case). But ideally the scheduler would not restart very often.
h
It's pulled in directly from VCS in
requirements.txt
.
And nope, no scheduler stuff each time. Just
Run Pytest
And agreed on the cache. I see
Copy code
20:41:43.97 [DEBUG] Completed: Hit: Local cache lookup: Building requirements.pex with 1 requirement: astranis-python@ <git+ssh://git@git.corp.astranis.com/source/monorepo.git@6980a7de52f8>
with
-ldebug
It seems the modules from my
astranis-python
dependency are taking a long time to import. When pants is running the test, the import sections take ~1.8 seconds. When I run it normally with
pytest
, it takes 0.002 seconds.
So, I think it's pretty safe to say that this problem will 🌈 go away 🌈 when I merge the two repositories. But I wouldn't be surprised if this comes up again. If there's not quick ideas, this one seems worthy of a bug ticket (though I'm not too sure how best to recreate without our specific repo).
1
h
A ticket with as many details as possible, including whatever you see repeated in
-ldebug
output across multiple runs, would be really appreciated!
Ah I see, this is a VCS requirement, so pulling it in is expensive
but again, should be cached
It seems the modules from my astranis-python dependency are taking a long time to import. When pants is running the test, the import sections take ~1.8 seconds. When I run it normally with pytest, it takes 0.002 seconds.
This seems key
I wonder if it has to do with compilation time
So it takes 2 ms if you run directly with pytest?
h
Yeah, nothing heavy is running on import so it must be some really inefficient path resolution for each import.
h
What happens if you nuke all the *.pyc files before running pytest directly (without Pants) ?
1
And also, just to clarify what you're saying, the imports are happening inside the pytest run? So in addition to the overhead, the pytest run itself is slower?
h
No change with
.pyc
files deleted.
I'm doing a silly experiment by running my test file with pytest and pants. I've modified the file to wrap my imports like this
Copy code
import time

t1 = time.monotonic()

<here's where all my from astranis.whatever import thing goes>
...
<end of astranis import>

raise RuntimeError(time.monotonic() - t1)
When I run with just
pytest
(whether pyc files have been nuked or not) I see those imports taking around 2 ms.
Copy code
ops/utils/wheel_utils_test.py:16: in <module>
    raise RuntimeError(time.monotonic() - t1)
E   RuntimeError: 0.002227158984169364
and with no changes, when pants runs it, I see around 2 seconds
Copy code
21:02:42.57 [ERROR] Completed: Run Pytest - ops/utils/wheel_utils_test.py:tests failed (exit code 2).
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /tmp/process-executionXr6TSG, configfile: pytest.ini
plugins: typeguard-2.12.1, timeout-2.1.0, cov-3.0.0
timeout: 4.0s
timeout method: signal
timeout func_only: False
collected 0 items / 1 error

==================================== ERRORS ====================================
________________ ERROR collecting ops/utils/wheel_utils_test.py ________________
ops/utils/wheel_utils_test.py:16: in <module>
    raise RuntimeError(time.monotonic() - t1)
E   RuntimeError: 1.5522927539423108
and it's very clearly proportional to the number of
from astranis import something
like statements I have
Actually, I take that back. I found something interesting. I walked through the imports and found that one of my paths I was importing took an exceptionally long time (a file from our astranis-python repo called
environment.py
). There's nothing heavy running on import (just normal class and function definitions). The only outliar thing is that file needs to import
psycopg2
. I know that one is special in how it's downloaded so maybe it's taking it a while to know that was bundled with astranis-python to find?
Yeah, I feel pretty confident that psycgop2 is the problem. Sorry for the wild ride.
c
Sorry for the wild ride.
Not at all, these cases are always interesting, because it shows an area that can be improved, in terms of insights/feedback/transparency whats going on etc. 🙂 (at least my personal view)
💯 2
h
So, I feel pretty confident that Pants isn't really introducing a problem here. Or maybe it is, I'll let y'all weigh in on that. What I've found is that one of our modules in particular is very heavy to import. Not because it's doing a lot, but because it has a lot of transitive dependencies. I think in our venv world, we haven't noticed it because, as was pointed out earlier, things start hitting the pycache and helping us out. Without that caching available in pants because everything is being run independently, we have a very heavy import process for a lot of tests (because they all fundamentally rely on this module with a lot of transitive dependencies). Without knowing more about the system, I'm not sure I know what Pants could do about this.
👍 1
Assuming the count command I did worked right, this module has 513 transitive dependencies. So I think we're simply just seeing import time without caching stack up.
h
https://github.com/pantsbuild/pants/issues/11339 is probably not helping. That's why Benjy asked you to delete
.pyc
files. But curious that you said it didn't make a difference. Dang I wish I remembered to put that in the 2022 prioritization survey
e
The only code that should suffer recompilation is 1st party.
👍 1
The motivating case was the Django repo with large quantities of 1st party showing O(1s) to compile IIRC.
1
h
I was deleting the wrong pyc files originally. When I move everything to my monorepo, it's all first party. My problem is that this repo I'm bringing over has a bunch of really small scripts that all use this common, huge library. So executing all their tests is forcing this recompilation to happen every test because each one is getting its own memory space in separate processes.
I think that ticket is exactly what I'm getting hit by.
In general, I think we can take a hard look at how we were doing things now that Pants helps us see how unwieldy it got. I do think it's totally valid to not have to recompile library code for every pants run, but we should also be smarter about our source code and not blow up transitive dependencies.
b
This would make an interesting blog post. The journey of figuring out what's going on, and how it ultimately led you to this conclusion. Migrating often leads to interesting reveals and ultimately new thinking. Fascinating to hear those lessons.
h
The smoking gun would be to see how long those imports take outside Pants but in a completely pristine environment with no pycache
Because I don't think importing
psycopg2
should take 2 seconds even in that situation
Hmmm, not so far off:
Copy code
(venv) statler:[/tmp/venv]$ time python -c "import psycopg2"

real	0m0.757s
user	0m0.044s
sys	0m0.018s
(venv) statler:[/tmp/venv]$ time python -c "import psycopg2"

real	0m0.062s
user	0m0.043s
sys	0m0.016s
h
Right, that's exactly what I'm seeing. I used tuna to inspect and realized that, without caching, and it's really heavy without the caching.
psycopg2
was kind of a false positive it seemed. The real issue is just the number of imports and lack of caching. So each test (which lives in a different process) all have to do the same heavy import from scratch each time.
h
Hmm, I think that first number was an outlier. It's only 70ms even if I nuke the pycache
Yeah
That makes sense
But now I'm not sure that moving your code into the repo would help
You'll still do all those imports on every test
I don't think we optimize for preserving or sharing
__pycache__
(and I'm not sure we can easily, since it's not append-only)
h
That was exactly my observation when I did a dry run of merging the repos. Things are getting rebuilt each time.
h
So compiling your first-party code has a measurable cost. That is an important use-case to know about and address.
3
h
Yeah, I think it's a tough ask. We're starting to get really close to just asking for a compiled form of the library and properly linking it. I think we're getting close to asking python to be more than it is.
So one of my suggestions to the team was, as convenient as the existing layout makes things for folks, it's exploded our transitive dependencies and made really heavy applications. And pycache has been silently bailing us out for a while.
h
If you could open a ticket with all the info that would be excellent! So we can track this
h
yup, will link here when I get to it. Really appreciate all the thorough look at what turned into a scratchpad of a thread.
❤️ 1
h
One thing that will make things better is, we're working on support for "grouping" processes into a single invocation. Today basically a process such as pytest or pylint either runs on a single source file or all of them. With the obvious tradeoff between cache utilization and concurrency, vs. execution overhead. We're working on supporting intermediate batch sizes.
So basically you could amortize that compilation time over an entire batch of tests
😍 1
That may be scant consolation if you're iterating on a single test, but still
h
I like the batching idea. I could see collecting a group of tests into the same process and maintaining speed by running them in parallel internal to the pants process with
pytest-xdist
. You still only need to collect/batch what has changed according to the dependency graph. Feels like "advanced option" territory, but maybe more tractable than handling the general problem. It's definitely fair to suggest that we don't architect code that has such a heavy dependency tree.
Tried to capture what I believe to be the current state of things here: https://github.com/pantsbuild/pants/issues/14354.
b
yup, will link here when I get to it.
Just an aside: since we're on Slack's free tier, messages disappear from here relatively quickly. So it's best to recap conversation when opening a Pants issue, rather than just linking.
h
Ah good to know. I linked the thread in the description as additional context so I should probably remove that. The context in the ticket should be self sufficient.
h
Thanks!
e
One big missing piece of context is Pants version. Much of what has been theorized about pyc compilation & caching is invalid depending on the Pants version and it's configuration.
h
Thanks for the reminder, I'll go update.
e
I'll have to look into things more deeply, but
venv_use_symlinks
here: https://www.pantsbuild.org/docs/reference-pex#advanced-options should make 3rdparty (psychopg?) compilation fully shared leaving only 1st party code dealt with via
PEX_EXTRA_SYS_PATH
internally by Pants pytest setup suspect. IOW your 1st party code compilation time would have to be large assuming it is still added to the
sys.path
for pytest via
PEX_EXTRA_SYS_PATH
.
h
Yes, sorry, the psycgop2 comment earlier seemed promising, but I actually think it's our first party code (specifically when we hit one of our modules because it ends up having a huge number of transitive dependencies).
Copy code
(mono-venv) ~/devel/monorepo (better-bootstrap)$ ipython
Python 3.8.10 (default, Nov 26 2021, 20:14:08) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import time

In [2]: def test():
   ...:     t1 = time.monotonic()
   ...:     from astranis.service import handler
   ...:     print(time.monotonic() - t1)
   ...:     

In [3]: test()
1.0756074539385736
to give a sense of the time involved.