I'm looking at a performance bottleneck in our js ...
# development
f
I'm looking at a performance bottleneck in our js monorepo and trying to figure out what can be done about it. It appears
infer_js_source_dependencies
is taking ~30 seconds to run when I'm running a single test (no pantsd). I think
hydrate_sources
is the bottleneck but I'm having a hard actually verifying that. My py-spy profile is mostly empty for some reason and profiling in OSX's Instruments shows time spent in the python interpreter, but I'm don't know how to get pystack frames out of it. Looking for advice on: • How should I approach profiling this? The number of async rules being invoked makes it a bit tricky • Is
hydrate_sources
necessary? Does the python dependency inference have the same issue? Some size info, there are ~1k
infer_js_source_dependencies
calls (I think thats about the size of the transitive sources for the command I ran. total sources in repo is larger) https://github.com/pantsbuild/pants/blob/1aeddf653e99f4b5404a0627ed3e135574210044/[…]c/python/pants/backend/javascript/dependency_inference/rules.py
w
That's interesting, it runs that EVERY time?
Like, none of it gets cached? (Ignoring how long it takes, just for the moment)
f
Pantsd caches some of this, but I'm looking at the cold case for our infrequent users/ci runs. The local disk cache doesn't appear to be doing much afaict
w
🤯 I noticed some performance oddities the other day while trying to debug something, I assumed it was a "my machine" thing.
Assuming that parse_javascript_deps is doing what I think it's doing, it should be running tree-sitter on that code
If that's the case, I wonder if we're doing too much, or if there is some indirection we don't need. Tree sitter is absurdly fast, so seeing even 70-80ms feels like a lot for a file
f
Yeah I believe that's right, there's a bunch of python rules that get invoked before treesittter though
w
Still, feels slow just off the cuff. I'd have to dig through the code. THe
parse_javascript_deps
part seems pretty straight through to rust though - with the python indirection. So I would have thought (without any benchmarking or debugging) that the time shown there is pretty much just parse time
Interestingly, if we really wanted to - we could write custom grammars for only the stuff we want - and cut down that time even mroe. But that's not really the big problem here (though, it's surprisingly a lot)
The 10:1 is the big problem šŸ¤¦ā€ā™‚ļø
f
From my trace (first image) it's not parse_javascript_dependencies that's taking the most time though. It's hydrate sources
w
Yeah, that's what I mean. While I think the straight parsing is still kinda slow, that's not the bulk of the problem. It's pulling the file contents into memory and passing that to the rust parser(s) (I think docker, python, and javascript all work the same way). I'm surprised by that, as I thought that would have been handled in the rust code - in the event we already have cached contents, or in the event we wanted to get contents as soon as it runs.
How big are these JS files?
And if you have some reproduction repo or some code in the codebase where you can replicate the problem you see (even in the tests of our JS backend) - then I can take a look at seeing what's going on with that hydration (e..g it makes sense that we need to hydrate, but whether we're doing it optimally - that's the question) As for your profiling question. I use mac, and trying to use instruments with anything other than Swift/ObjC apps is such a pain - I hate it
f
I don't think they're abnormally large, but I'll look through them when I'm back at my machine to see if there are some outliers
šŸ‘ 1
f
How should I approach profiling this? The number of async rules being invoked makes it a bit tricky
You can use https://github.com/shoalsoft/shoalsoft-pants-opentelemetry-plugin to send workunit tracing data to any OpenTelemetry-compatible system including a local instance of Jaeger.
The plugin also has an option to write traces to a jsonl file if you want to do scripting with the trace data.
f
Thanks @fast-nail-55400 My otel setup is ~equivalent, but since yours is open source I can embed it in the reproduction more easily šŸ™ fyi I think your readme need to be updated for the local Jaeger configuration, it was trying to send to the wrong port without this
Copy code
[shoalsoft-opentelemetry]
enabled = true
exporter_traces_endpoint = "<http://localhost:4318/v1/traces>"
Heres a reproduction, 1000 js files takes ~17 seconds to infer deps This is the jaeger view of the trace, similar to what I was showing in honeycomb but with less workunits here (I'm using a release version of pants instead of a local one I added more workunits to) https://github.com/chris-smith-zocdoc/pants-slow-js-deps-demo
🤯 1
If you set
--streaming-workunits-level=trace
there are ~28k
generator
spans, which seems like it could be related to the problem
Modifying the test script to generate the same structure with python, I see 800ms in transitive_targets, significantly better. So appears to be something specific to the js rules
w
Yep, just ran it - it was rough
I made some changes and it still runs pretty slow - I can try to dig into this one a bit during the week.
gratitude thank you 1
Thanks for the repros - I found a problem, but I don't know if it's a JS thing - or more general... To hydrate the python files takes a couple hundred ms (max) which makes some sense. The JS files exhibit a problem, where we start off fast, progressively slow down to 5 seconds per hydrate, then "something" happens, and we go back to fast, ramp up to 5s, snap back to 0, ramp up, etc.
I'll need to take a look into how the hydration works, as it's been years - and then where the JS files are stored - to see if it's any different to python for some reason. Either that, or I might be seeing a symptom of something else - like overworking some part of engine
Yeah, might be a symptom - basically everything is getting clogged up, and all the awaits are backed up on something in the engine - should be a pretty direct thing to look into and I have some ideas
I've been absolutely nerd sniped by this problem. I love it (as much as you hate it). The test repo is chef's kiss also. We need a good performance-centric, dependency-inference generator for all backends (kinda like this repo, and what https://github.com/jriddy/clam-diggers was). Super useful in debugging something else I've been cooking up on the side.
Alright, think I've narrowed down the problem. Feels like a thundering herd-ish problem, but might be something simpler than that in the end. Fundamentally, the problem seems to be
_determine_import_from_candidates
https://github.com/pantsbuild/pants/blob/c45acc27e3d6886e2628be3905e400134fa69f40/[…]c/python/pants/backend/javascript/dependency_inference/rules.py There is A LOT of stuff that happens in that function, including duplicate work, when you start jumping into those awaited calls. And in the contrived example, it's hit 1000 times (thus triggering who knows how many more native tasks internally). Trying to find out where our task/rule stuff is to make it easier to check whether it's related to the sheer number of tasks, or whether each of the tasks is doing more than it should need to do because of the nested dependency calls causing the dep tree to spread out a lot. Either way, I was able to see this was the problem by basically forcing the correct set of Addresses to be emitted on each of those calls (bypassing all of the rules below it), and I went from 18 seconds down to about 1-2
f
One interesting thing to note about that function. I wasn't able to annotate it with
@rule
because of the last argument
file_extensions
So I feel like its bypassing caching/memoization when maybe it needs it?
w
It's possible, but it's being called once per import - we have 5 imports per file, for the first 200 files. I don't know if it is a caching issue necessarily, because it runs the same number of times as the imports. Whether it should do that is a question mark.
Maybe there is some downstream cache being missed, but (and I can't verify this without being able to get more insight into the engine in realtime) we might just be creating a ton of tasks. If caching isn't working as we think, then yeah, 100% it'll be a problem, but we might just not need to do so much. One other thing I was thinking about is that for each backend's dependency inference, I think it should be more monolithically completed in the engine rust code - rather than trampolining data back and forth - as there just isn't much of a need for that. I would expect to call "infer_deps" with some input(s) and then get the fully formed results back, not anything I would need to call back in for
As a comparison,
deno info index.js
takes about 60ms
p
> I would expect to call "infer_deps" with some input(s) and then get the fully formed results back, not anything I would need to call back in for I think that's how the python dep inference works. A per-file request goes to the engine, that gets all of the imports and strings that could be imports or resources, and sends the whole thing back to python. There's a little bit of filtering of those possible matches on the python-side based on settings (like number of dots in the string), but the extraction happens in one pass through the engine. Does the JS code not do the same thing?
w
Does the JS code not do the same thing?
Nope: But personally, I would even think to send all the file requests in at the same time, since there's likely overlap - but that's an optimization I can't really say would be good without stats
@proud-dentist-22844 Do you know if we can insert our own dep inference via a union rule or similar? it'd be a bit greasy, but could be nice, especially when debugging. Edit: I specifically meant override the existing inference
f
We have a way to override rules in our internal plugin, I can find that code if you want. I think there's also a thread on it from me a while back
w
@fresh-mechanic-68429 If you could please - I'm doing something wrong, but I'm not sure what. My override is not running
Ah, via patching?
f
This is the thread, that should give you enough to get it working. If not I'll pull our code https://pantsbuild.slack.com/archives/C0D7TNJHL/p1750993686068529?thread_ts=1750911821.901069&amp;cid=C0D7TNJHL
w
Ah yes, okay, I remember this one. I think this was caused because of call by name and no good way to override rules
THanks!
f
Yeah
w
šŸ‘
p
Yeah. The ability to override rules was a casualty of the call-by-name migration (but the speed boost is worth it!).
šŸ‘ 1
w
Oh, I forgot to mention re: this problem, the bulk of the time is spent in
find_owners
- kinda just noting that for myself. Looking for a quick workaround to make this less painful in the shortterm
f
I took a deeper look at this and how it differs from the python implementation. My rough understanding is that the js one is slow because of the repeated calls to disk for files which may or may not actually exist
Copy code
paths = await path_globs_to_paths(
    _add_extensions(
            candidates.file_imports,
            file_extensions,
        )
    )
local_owners = await find_owners(OwnersRequest(paths.files), **implicitly())
The main difference in the python impl is that is builds a hashmap first, AllPythonTargets -> FirstPartyPythonModuleMapping, allowing all the lookups to be done in memory rather than sequential checks to disk. If this sounds correct, would a suitable solution here be to implement the same approach on the js side? That can be done in just python, no rust changes required. I've got a very rough poc going thats ~10x faster (2s instead of 22s) for that demo repo
šŸ‘€ 1
w
It sounds like a place to start, especially if it can be done in the Python side first. I'm working on some experiments to really drop that time - but I'm also looking at all of the dep inference backends to see if they can use unified dep inference code - with a drop-in per-backend set of rules. That way, we'd be able to have a more consistent inference time overall