Are there any simple tricks to get pants to get better at re Pants #general

Are there any simple tricks to get pants to get be...

ancient-france-42909

07/17/2023, 10:26 AM

Are there any simple tricks to get pants to get better at resolving transitive dependencies? I told it to run some tests, it's been running for 10 minutes and it's like this currently:

Copy code

1917702 <http://cris.bi|cris.bi>+  20   0  813.5g   6.6g 108692 S 103.7  21.2  12:40.82 pantsd [/home/c

I have 32GB of memory, so it's not a huge issue, but our CI doesn't approve of this. 😞

refined-addition-53644

07/17/2023, 11:11 AM

Are you using lockfiles? That helps in speeding up things a bit.

ancient-france-42909

07/17/2023, 11:11 AM

We have parametrised intepreter constraints, it's not really easy to move to lockfiles

fresh-cat-90827

07/17/2023, 4:00 PM

there are others who face performance issues (time / memory usage) when querying large dependency graphs. I am afraid troubleshooting performance would require careful logging collection and interpretation, so you may want to file a GitHub issue with more details, the command you run, the size of your repo, the memory consumption etc anything that would help us troubleshoot. Apart from that, I think the only way to achieve a decent performance for a large repo is to have a cache, either local or remote one to avoid unnecessary computations. Is this an option? Please see https://www.pantsbuild.org/docs/using-pants-in-ci It doesn't have to anything fancy immediately; even a directory shared between builds that take place on that node should have a profound effect on performance (if you can't have a shared remote cache, e.g. https://www.pantsbuild.org/docs/remote-caching-execution#server-compatibility)

fresh-cat-90827

07/17/2023, 4:02 PM

FWIW https://github.com/pantsbuild/pants/issues/18911 a related issue

ancient-france-42909

07/17/2023, 5:18 PM

We do have the cache shared, but this happens locally even without me changing any code, 10 minutes of pain and misery.

fresh-cat-90827

07/17/2023, 5:19 PM

gotcha, thanks for clarifying. It's likely that you are suffering from the same issue as the one I've shared above.

ancient-france-42909

07/17/2023, 5:19 PM

Yeah, I'm reading it now

👍 1

ancient-france-42909

07/17/2023, 5:21 PM

Yeah, we have 6.5k python files, so I'm guessing the growth is not linear. 🙂

ancient-france-42909

07/17/2023, 5:23 PM

I think I might just bite the bullet and look into lockfiles with all the magic macros that we need to use.

fresh-cat-90827

07/17/2023, 5:23 PM

I am not sure lockfiles would resolve your pantsd memory / performance issues

ancient-france-42909

07/17/2023, 5:24 PM

Oh, then I won't 🙂

ancient-france-42909

07/17/2023, 5:25 PM

This did get much worse when we went from 2.6 to 2.14 (in one go, so can't really tell you when), but not by 20%, but by 3-400%

ancient-france-42909

07/17/2023, 5:25 PM

And I'm not sure if it's just been getting slower because we have more crap in our repo, or any changes in pants, but it's definitely way worse lately.

fresh-cat-90827

07/17/2023, 5:26 PM

😄 if you let it run on a fresh machine, say it takes 10 mins. If you re-run it say 100 times, does it consistently takes only a few seconds to fetch from the memory (as the results will be memoized)? You may also need to tweak your pantsd memory, see https://www.pantsbuild.org/docs/reference-global#pantsd_max_memory_usage

ancient-france-42909

07/17/2023, 5:26 PM

well, if it takes 7GB of memory, it'll get killed (as it's a looot, regardless of what I said initially about 32GB making it okay)

ancient-france-42909

07/17/2023, 5:27 PM

but, yes, for smaller runs, it's instant the second time

fresh-cat-90827

07/17/2023, 5:27 PM

oh I see, so it does grow beyond 4Gib used by default, gotcha

fresh-cat-90827

07/17/2023, 5:28 PM

what if you disable using pantsd at all?

--no-pantsd

and only use local cache?

ancient-france-42909

07/17/2023, 5:28 PM

let me give it a go, it might actually be faster, I was looking at the CI and it's NOT that slow

fresh-cat-90827

07/17/2023, 5:29 PM

I am not intimately familiar with how pantsd daemon works, but if it gets killed by the OS and then it takes a lot of time to get started / schedule work etc.

ancient-france-42909

07/17/2023, 5:30 PM

Oh, it doesn't get killed by the OS, it dies after it's done with the command, because of that option you mentioned.

✅ 1

fresh-cat-90827

07/17/2023, 5:31 PM

ah so you left it at the default 4GiB; I saw it's taking ~7GiB so thought you increased it but then OS interferes and kills it

ancient-france-42909

07/17/2023, 5:31 PM

It's at least not 5 times faster, it's been running for 2 minutes now 🙂

ancient-france-42909

07/17/2023, 5:32 PM

Nah, it doesn't get killed, but it does need to swap a bit. It doesn't seem to affect pants, though, I closed pycharm and chrome, no extra swapping and it still took >10m

✅ 1

ancient-france-42909

07/17/2023, 5:33 PM

Okay, so it's much faster and uses less memory without pantsd.

ancient-france-42909

07/17/2023, 5:33 PM

It uses 4GB of memory and it was done in 4-5 minutes

ancient-france-42909

07/17/2023, 5:34 PM

Or, hm, there's still a resolve transitive deps there, but it said it's running tests

fresh-cat-90827

07/17/2023, 5:34 PM

this is something. What I also would like to try is to use https://www.pantsbuild.org/v2.17/docs/reference-python-infer#use_rust_parser in 2.17 - if you have lots of files, maybe you are getting a bottleneck there

fresh-cat-90827

07/17/2023, 5:34 PM

I saw some folks reporting significant dep inference performance improvements

ancient-france-42909

07/17/2023, 5:35 PM

Let's see (but, it's gonna take a bit, need to take care of some personal stuff before I try to upgrade pants)

fresh-cat-90827

07/17/2023, 5:35 PM

no need to do the full upgrade Pants, you can just use the new version locally just to experiment 🙂

ancient-france-42909

07/17/2023, 5:36 PM

I mean, I need to see if our plugins work 🙂

fresh-cat-90827

07/17/2023, 5:37 PM

When I experiment with a new version, I just disable them in the backends if there are any changes in the plugin API (during the experiment) 🙂

ancient-france-42909

07/17/2023, 5:37 PM

True, could do that.

✅ 1

ancient-france-42909

07/17/2023, 5:39 PM

Btw, we don't use dependency inference. Just the

__init__.py

, that causes me headaches every so often.

fresh-cat-90827

07/17/2023, 5:39 PM

Btw, we don't use dependency inference

oh this is the first time I get exposed to a repo that doesn't have it enabled. How do you declare dependencies between targets?

ancient-france-42909

07/17/2023, 5:40 PM

Just manually.

fresh-cat-90827

07/17/2023, 5:40 PM

wow

ancient-france-42909

07/17/2023, 5:40 PM

we're nothing if not hard working 🙂

😂 1

ancient-france-42909

07/17/2023, 5:42 PM

Are lockfiles mandatory in 2.17?

fresh-cat-90827

07/17/2023, 5:45 PM

Just manually.

there must be a good reason why you do that, I'd love to learn more! If you'd like to explore automatic build target generation, feel free to explore https://www.pantsbuild.org/v2.17/docs/reference-tailor. I know that for example for Bazel you have to manually declare dependencies, but you would still take advantage of tooling such as Gazelle to generate the dependencies for you. You of course get into a terrible situation when you have both human and machine generated dependencies which is suboptimal.

fresh-cat-90827

07/17/2023, 5:45 PM

Are lockfiles mandatory in 2.17?

no they are not

ancient-france-42909

07/17/2023, 5:47 PM

okay, it was our plugin that was screwing it up, I guess

ancient-france-42909

07/17/2023, 5:48 PM

and, well, I'm not sure why, except this is how we've always done it.

fresh-cat-90827

07/17/2023, 5:49 PM

imho, lock files are optional and require careful planning as you may end up with lots complications. I had worked in a Pants monorepo with constraints files generated with

pip-compile

and https://www.pantsbuild.org/v2.17/docs/python-third-party-dependencies#constraints-files worked lovely. So I wouldn't go for it just for the sake of having them. In a large monorepo with complicated tooling lockfiles may make your life worse if not carefully researched first, imho.

ancient-france-42909

07/17/2023, 5:50 PM

We do want them, even if it will make life more complicated, our 3rd paty deps are a mess currently, only part of them are in constraints... Anyway, we need to figure out how to do parametrised interpreter constraints in a way that doesn't make people sad

ancient-france-42909

07/17/2023, 5:52 PM

Definitely not a huge improvement in speed with the rust inference parser.

fresh-cat-90827

07/17/2023, 5:52 PM

that's because you don't read the

import

statements in Python files, do you

fresh-cat-90827

07/17/2023, 5:53 PM

the file parsing and ast construction evaluation is what was sped up

ancient-france-42909

07/17/2023, 5:53 PM

it still does

__init__

files, and quite a few of those have a million imports.

fresh-cat-90827

07/17/2023, 5:54 PM

gotcha, so some parsing is still done, I see

ancient-france-42909

07/17/2023, 5:58 PM

I just now read that it gets worse if you turn inference off, I wonder...

ancient-france-42909

07/17/2023, 5:59 PM

Well, at least it failed fast.

ancient-france-42909

07/17/2023, 5:59 PM

Copy code

pants.engine.target.InvalidFieldException: The target defender/defender/tasks/populate_defender_monitoring_stats.py:../../lib has the `interpreter_constraints` ('CPython~=3.7.4', 'CPython~=3.10.9'), which are not a subset of the `interpreter_constraints` of some of its dependencies:

  * ('CPython~=3.10.9',): insights/insights/defender_stats/entities.py:../../lib
  * ('CPython~=3.10.9',): insights/insights/defender_stats/stat_types.py:../../lib

ancient-france-42909

07/17/2023, 6:31 PM

Anyway, will try to look into this more. The problem is that I'm trying to reduce the CI time, and I managed to get all tests to run in parallel, but now if we don't split into multiple workers, the dependencies bit is slowing things down 🙂

happy-kitchen-89482

07/17/2023, 7:45 PM

To clarify, what do you mean by "resolving transitive dependencies" ? Which processes are taking a long time?

happy-kitchen-89482

07/17/2023, 7:46 PM

Is it a pip resolve?

happy-kitchen-89482

07/17/2023, 7:46 PM

Forcing Pants to use a newer pip can speed those up, since newer pips have better backtracking heuristics

ancient-france-42909

07/18/2023, 9:52 AM

Sorry, I misspoke. It says

323.05s	Resolve transitive targets

happy-kitchen-89482

07/18/2023, 9:28 PM

That is shockingly long. That rule isn't running external processes, it's just doing graph traversal in memory.

happy-kitchen-89482

07/18/2023, 9:29 PM

How big is your build graph?

happy-kitchen-89482

07/18/2023, 9:29 PM

(how many source files in the repo is a good approximation)

ancient-france-42909

07/19/2023, 6:47 AM

About 6.4k python files

happy-kitchen-89482

07/20/2023, 4:35 PM

That's a small number, so hmm...

happy-kitchen-89482

07/20/2023, 4:35 PM

Something is very off here

ancient-france-42909

07/20/2023, 4:35 PM

If there's anything I can share to look into this, I'm more than willing 🙂

ancient-france-42909

07/20/2023, 4:35 PM

I just don't really know where to begin 😞

happy-kitchen-89482

07/20/2023, 4:36 PM

I assume this code is proprietary and you can't share it in a public github repo?

ancient-france-42909

07/20/2023, 4:36 PM

okay, short of that 😄

ancient-france-42909

07/20/2023, 4:36 PM

yeah, it is

happy-kitchen-89482

07/21/2023, 6:51 PM

Well, confidential channel support (including NDA) is potentially available to project sponsors now! See the new sponsorship page: https://www.pantsbuild.org/docs/sponsorship

ancient-france-42909

07/21/2023, 6:52 PM

Not because of that, but I was trying to get us to sponsor pants, since it's such a big part of our workflow 🙂

❤️ 2

flat-zoo-31952

07/21/2023, 6:54 PM

Are you using any `UnionRule`s with

InferDependencyRequest

by chance?

flat-zoo-31952

07/21/2023, 6:55 PM

I have 8k targets and I am struggling with similar issues

ancient-france-42909

07/21/2023, 6:56 PM

Our only custom plugin just does the releases, and it's really simple, other than that it's just a huge mess of python sources that depend on each other too much. 🙂

flat-zoo-31952

07/21/2023, 6:56 PM

just a huge mess of python sources that depend on each other too much

Perhaps that's the commonality here 😄

😄 1

ancient-france-42909

07/21/2023, 7:02 PM

Yeah, I still think it's excessive. I might try to get the powers that be to approve me spending a few days on profiling this, but I'm not sure I can figure it out, my only expose to rust was reading parts of their books while commuting, so yeah. 😞

flat-zoo-31952

07/21/2023, 7:05 PM

We use dep inference, and I did notice in experiments in turning it off that it got slower not faster. "Not using dep inference" isn't a well-tested use case in Pants, because it's not really expected. I'm going through what I think is the relevant code here https://github.com/pantsbuild/pants/blob/7c3270f3631d6540fd822deac0236703696f59af/src/python/pants/engine/internals/graph.py#L1307

ancient-france-42909

07/21/2023, 7:07 PM

We tried turning it on, but we got some weird errors from doing it.

flat-zoo-31952

07/21/2023, 7:07 PM

(i think that's a link to 2.17.0rc2 's code, but you get the picture)

fresh-cat-90827

07/21/2023, 7:08 PM

I'd also advocate for trying to chop off a part of the repo and run Pants again. I wonder whether it's the size that matters (it may certainly make it slower, most certainly), but it could be something about the runtime environment. For example, I get pantsd occasionally crashing on my hobby repo as well

fresh-cat-90827

07/21/2023, 7:09 PM

That is, keep the config in place, just try a smaller subset

flat-zoo-31952

07/21/2023, 7:09 PM

George, what mechanism do you use to turn off dependency inference?

ancient-france-42909

07/21/2023, 7:09 PM

Okay, we have it on for

__init__.py

files, but we didn't turn it on for anything else

ancient-france-42909

07/21/2023, 7:10 PM

Copy code

[python-infer]
imports = false

flat-zoo-31952

07/21/2023, 7:10 PM

hmmm I don't think that actually make dependency inference not run

flat-zoo-31952

07/21/2023, 7:14 PM

so yeah you're running dep inference, you're just not using it https://github.com/pantsbuild/pants/blob/b01ce8ca9e93ce6a7bea396bd9da6ed651aed41b/src/python/pants/backend/python/dependency_inference/rules.py#[…]9

ancient-france-42909

07/21/2023, 7:14 PM

That's a bit silly. 🙂

ancient-france-42909

07/21/2023, 7:16 PM

I, personally, am not a fan of this feature, but I never looked too much into it, since we cannot turn it on anyway.

flat-zoo-31952

07/21/2023, 7:16 PM

oh... i'm lying

flat-zoo-31952

07/21/2023, 7:17 PM

it might depend on which version you're using

ancient-france-42909

07/21/2023, 7:17 PM

We're on 2.16.0

flat-zoo-31952

07/21/2023, 7:18 PM

https://github.com/pantsbuild/pants/blob/release_2.16.0/src/python/pants/backend/python/dependency_inference/rules.py#L458-L459 it doesn't parse your deps

ancient-france-42909

07/21/2023, 7:19 PM

To be fair, I'm sure that even parsing those, it shouldn't be this slow.

flat-zoo-31952

07/21/2023, 7:20 PM

yeah it's not the parsing that's slow

flat-zoo-31952

07/21/2023, 7:20 PM

or that's taking up so much memory

flat-zoo-31952

07/21/2023, 7:20 PM

I'm just trying to nail down what's actually happening

ancient-france-42909

07/21/2023, 7:21 PM

It seems to me that it's just something that doesn't scale linearly, things are getting way worse as we're adding files

flat-zoo-31952

07/21/2023, 7:22 PM

Yeah I think I'm gonna go through at some point and start commenting things out in the source or replacing with no-ops and see where i get a speed up

flat-zoo-31952

07/21/2023, 7:22 PM

there's something really funky going on

ancient-france-42909

07/21/2023, 7:23 PM

But, yeah, it's getting slightly ridiculous for us, for some of our apps, running tests is basically 5 minutes of pants doing shit, then 3-4 minutes of running tests 😞

flat-zoo-31952

07/21/2023, 7:24 PM

I've submitted some speedscope profiles before and it's led to a few improvements, but there's something weird happening here and I think it's gonna take a more aggressive approach

flat-zoo-31952

07/21/2023, 7:24 PM

Oh we have hours of tests to run, so we barely even feel it

flat-zoo-31952

07/21/2023, 7:24 PM

I'm only kinda joking

flat-zoo-31952

07/21/2023, 7:24 PM

We feel it more when we try to put it in dev-facing tooling

ancient-france-42909

07/21/2023, 7:25 PM

We split our tests by app, it actually does get better if you're asking pants to do this for less targets.

ancient-france-42909

07/21/2023, 7:27 PM

Actually, we even split apps into multiple 'shards' (not sure what circleci calls them, that's what we called them before we moved there), and the time it takes to start running tests depens on how many tests we want to run

flat-zoo-31952

07/21/2023, 7:27 PM

I wish that were an option for us. We use

dependents

which requires that Pants calculate all dependencies

flat-zoo-31952

07/21/2023, 7:28 PM

But I'm going to dive into this. I think hacking and slashing the graph and dep inference code might at least let me find where a bottleneck is in the code

ancient-france-42909

07/21/2023, 7:29 PM

we have a step at the beginning that gets the dependencies between this PR and master, then split it into, erm, we call them components (sometimes more than one app/lib), and each of these runs tests/lint/typechecking separately.

flat-zoo-31952

07/21/2023, 7:29 PM

This is hard for much of the Pants team to work on because it's not clear how to create reproducers for these issues

flat-zoo-31952

07/21/2023, 7:29 PM

I'm a maintainer and I have access to a repo that can reproduce this, so I guess it should fall on me 😅

ancient-france-42909

07/21/2023, 7:29 PM

we used to run it all together, it did take more than one hour, but this was on an ancient version of pants

flat-zoo-31952

07/21/2023, 7:30 PM

we have a step at the beginning that gets the dependencies between this PR and master, then split it into, erm, we call them components (sometimes more than one app/lib), and each of these runs tests/lint/typechecking separately

This is where I want to get to as well but it will take time

ancient-france-42909

07/21/2023, 7:30 PM

well, if you have any ideas, I can try them on our side, and I definitely am willing to commiserate 🙂

flat-zoo-31952

07/21/2023, 7:31 PM

If you have some time to hack, https://www.pantsbuild.org/docs/running-pants-from-sources will show you how to build and run your own version of pants locally

flat-zoo-31952

07/21/2023, 7:32 PM

(This is what I'm going to do... tear bits out of the code I linked you and then run it on my codebase)

ancient-france-42909

07/21/2023, 7:32 PM

oh, right, I thought you meant removing parts of your repo

ancient-france-42909

07/21/2023, 7:32 PM

which is... really hard, hehe

flat-zoo-31952

07/21/2023, 7:32 PM

nah I need those bits

flat-zoo-31952

07/21/2023, 7:33 PM

we're a monolith masquerading as a monorepo, it's rough

ancient-france-42909

07/21/2023, 7:33 PM

I don't need 90% of it, but it's very hard to separate them 😄

ancient-france-42909

07/21/2023, 7:35 PM

btw, are you using lockfiles?

flat-zoo-31952

07/21/2023, 7:35 PM

heh... yes but sparingly

ancient-france-42909

07/21/2023, 7:35 PM

'cause, in the past, when I complained about this, I was told that lockfiles should make this better

ancient-france-42909

07/21/2023, 7:36 PM

or, well, that they should make things better, in general

flat-zoo-31952

07/21/2023, 7:36 PM

depends on what your issue is

flat-zoo-31952

07/21/2023, 7:37 PM

we use fake non-existent dependencies for python third party deps and run our tests outside of Pants because our actual code is provided by RPM packages, not PyPI

ancient-france-42909

07/21/2023, 7:39 PM

My condolences... I thought our setup is screwed up, but yeah... 😄

happy-kitchen-89482

07/21/2023, 10:16 PM

To clarify @ancient-france-42909’s issue, FWICT the time consuming thing is graph traversal, not dep inference? So I wonder if the hairball-ness is the issue.

flat-zoo-31952

07/22/2023, 2:09 AM

Maybe so but we need to be find ways to isolate where these bottlenecks actually are in the code

flat-zoo-31952

07/22/2023, 2:11 AM

And speedscope profiles haven’t really shown anything that has stood out as being a primary cause here. So I feel like hacking at this the old fashioned way and just remove and replace bits of code with no ops until I figure out what’s going on ☺️

happy-kitchen-89482

07/22/2023, 10:33 AM

🙂

happy-kitchen-89482

07/22/2023, 10:34 AM

@witty-crayon-22786 might know a way of creating a flame graph of rules

🙏 1

3 Views

Open in Slack

Previous Next