Hey everyone once again im having performance issues with th Pants #general

Hey everyone, once again im having performance iss...

salmon-barista-63163

12/18/2020, 4:28 PM

Hey everyone, once again im having performance issues with the v2 engine and looking for suggestions. Here is my setup:

Copy code

circleci
8 CPU
16 RAM

I am trying to run tests via pytest or behave. In this environment we start up our servers and run the tests. we have about 8 webservers running (with pants) and we kick off the tests. I am getting out of memory errors and process killed issues when running. If you have been following what i have been posting about i am having all sorts of performance issues. I am running pants 2.0.1rc4 (with a twist from a release that was cut for me yesterday). There is no way the performance of v2 can be this much worse than v1 as a whole. At this point im thinking im missing or overlooking something. Any suggestions?

hundreds-father-404

12/18/2020, 5:33 PM

I haven’t been following the past threads closely - apologies if you’ve already answered. Do you know where the costs are mostly coming from? Some candidates: * Cache upload and download * Resolving requirements * Pants overhead when setting up tests * The test processes themselves, that they’re slower than they were before Running with

--no-dynamic-ui

may be instructive, as it will output the start time and end time for things

witty-crayon-22786

12/18/2020, 5:38 PM

https://pantsbuild.slack.com/archives/C046T6T9U/p1608149983219400 is a previous thread on this

witty-crayon-22786

12/18/2020, 5:39 PM

my understanding is that you are spawning N

./pants run

“clients”, which are hitting M

./pants run

“servers” on the same box

👍 1

witty-crayon-22786

12/18/2020, 5:40 PM

is that right?

salmon-barista-63163

12/18/2020, 5:43 PM

yes. @witty-crayon-22786 but that part has been resolved. now throwing one more pants process to run tests or execute pytests on top of that really makes things go crazy

salmon-barista-63163

12/18/2020, 5:45 PM

little more context: • cci pants cache is loaded no issue. • the dynamic ui is off • pantsd is off Here are the things that are for sure slower with pants v2 than v1. • pants startup processes • pants building a pex file • running pytest • running our behave tests

witty-crayon-22786

12/18/2020, 5:45 PM

so, during the portion of

./pants run

where the process is spawned,

pants

itself should be using effectively no CPU at all, but it is definitely using memory. this was why john and i were curious about the OOM killer

witty-crayon-22786

12/18/2020, 5:47 PM

Here are the things that are for sure slower with pants v2 than v1.

when

pantsd

is off, it is not surprising that the first item is slower:

pantsd

is intended to cut off the startup time.

witty-crayon-22786

12/18/2020, 5:47 PM

and all other tasks will be lightly affected as well.

witty-crayon-22786

12/18/2020, 5:48 PM

we’re working on optimizing

pytest

runs, because there is some extra overhead there that we know about.

👍 1

witty-crayon-22786

12/18/2020, 5:49 PM

but i think that that might all be moot. if your processes are getting killed, i am only aware of the OOM killer being able to do that in linux. i do not know of a facility to kill things based on CPU usage

salmon-barista-63163

12/18/2020, 5:49 PM

I wasnt aware of anything that would kill processes for CPU either but if i watch the avail memory there is plenty there when it happens

witty-crayon-22786

12/18/2020, 5:52 PM

are the clients running

pytest

? or some other code?

salmon-barista-63163

12/18/2020, 5:53 PM

its pytest

witty-crayon-22786

12/18/2020, 5:54 PM

and this can’t be phrased as a single client running a single run of pants running multiple tests concurrently?

salmon-barista-63163

12/18/2020, 5:54 PM

i get alot of this error in my pytest

Copy code

"Error reading file File { path: \".coverage\", is_executable: false }: Os { code: 2, kind: NotFound, message: \"No such file or directory\" }"

which typically happens when there is no memory left to finish writing the coverage report

salmon-barista-63163

12/18/2020, 5:54 PM

im about to turn off the coverage feeatures and see if this helps

👍 1

salmon-barista-63163

12/18/2020, 5:56 PM

If i bump the circle ci executer size up one things start to smooth out…. but this isnt something i can do long term as they are very expensive

witty-crayon-22786

12/18/2020, 5:57 PM

at a fundamental level, changing your client to be a single run of

./pants test $multiple_targets

with parallelism enabled is what would be ideal, as that would use dramatically less memory.

witty-crayon-22786

12/18/2020, 5:57 PM

having tons of independent instances of pants is just not the intended usage right now. we’ll keep working on https://github.com/pantsbuild/pants/issues/7654 , but it’ll be at least a few weeks.

salmon-barista-63163

12/18/2020, 5:58 PM

so for the tests its one

./pants test

execituon

salmon-barista-63163

12/18/2020, 5:59 PM

im struggling with performance impacts everywhere we run pants in our repo.

witty-crayon-22786

12/18/2020, 5:59 PM

is that after the

./pants run

clients have completed? or concurrently?

salmon-barista-63163

12/18/2020, 6:00 PM

so when tests run its going to call into our webservers that are running with pants. to me this looks like too many running pants processes at once where with v1 we didnt run into this

salmon-barista-63163

12/18/2020, 6:00 PM

it is after the ./pants run cleints are up and running though

witty-crayon-22786

12/18/2020, 6:01 PM

“up and running” or “done and exited”?

salmon-barista-63163

12/18/2020, 6:01 PM

“up and running”

witty-crayon-22786

12/18/2020, 6:01 PM

ok, so you have ~17 pants processes at once?

salmon-barista-63163

12/18/2020, 6:02 PM

we could have upwards of 20+ yes

witty-crayon-22786

12/18/2020, 6:02 PM

v2 uses more memory… to a degree, that is intended and expected. it’s keeping work warm for further runs

witty-crayon-22786

12/18/2020, 6:03 PM

…which you can only take advantage of with

pantsd

hundreds-father-404

12/18/2020, 6:04 PM

I think you mentioned trying running

./pants package

, followed by

dist/app.pex

- rather than

./pants run

. Iirc, you stopped that approach because it was still too much contention when building the PEXes. Now that you have the building of PEXes part fixed, might it be worth trying that approach again? Then, you have ~20 Pex processes running and only 1 Pants process for Pytest

witty-crayon-22786

12/18/2020, 6:04 PM

so, i think that as discussed yesterday, i think you either need to 1) switch to

./pants package && dist/app.pex

2) wait for https://github.com/pantsbuild/pants/issues/7654, and we could try to prioritize it.

coke 1

salmon-barista-63163

12/18/2020, 6:05 PM

I am running

./pants package && dist/app.pex

for the one webserver that would spin up tasks (pants processes) on demand. We pre-package ~30 pexs and then execute by just running the pex. This is the rest of our stack that is causing issues now. Its just too many things running in pants i guess

hundreds-father-404

12/18/2020, 6:07 PM

It sounds like you’ve been playing with this already, but another thing you could continue tweaking is

--process-execution-local-parallelism

. Note that you can use it as a CLI option, not only a config file value. So, you can set it to a higher value when building PEXes, then lower it dramatically when running tests In v1, it was effectively set to 1. Tests ran sequentially.

hundreds-father-404

12/18/2020, 6:07 PM

Hm, possibly worth trying to see the impact:

./pants test --debug ::

emulates the v1 behavior. It runs each test sequentially in the foreground

witty-crayon-22786

12/18/2020, 6:08 PM

Screen Shot 2020-12-18 at 10.07.57.png

witty-crayon-22786

12/18/2020, 6:08 PM

https://discuss.circleci.com/t/logging-memory-usage-in-the-builds/442

witty-crayon-22786

12/18/2020, 6:10 PM

basically, it looks like circle have a memory usage killer other than the kernel

salmon-barista-63163

12/18/2020, 6:12 PM

hmmm

salmon-barista-63163

12/18/2020, 6:12 PM

i have never seen that file

witty-crayon-22786

12/18/2020, 6:13 PM

it’s an older post… newer posts about memory limits have more information

salmon-barista-63163

12/18/2020, 6:13 PM

I have tried running tests sequentially like in v1 by setting the -process-execution-local-parallelism to 1. same issues

salmon-barista-63163

12/18/2020, 6:13 PM

ill check into some of then circle ci memory posts

witty-crayon-22786

12/18/2020, 6:15 PM

https://support.circleci.com/hc/en-us/articles/115014359648

witty-crayon-22786

12/18/2020, 6:16 PM

(but, basically: if you try to search for “CPU usage killer” for circleci, you only find information about memory usage)

salmon-barista-63163

12/18/2020, 6:19 PM

yeah i know the exit code 137 oom one. I litterally get a message from my linux container that says

Copy code

Killed: <pid> Out of memory

thats what i am fighting in some test jobs here and others just all the sudden it exits with exit code 1

salmon-barista-63163

12/18/2020, 6:20 PM

the placese where it kills itself is always different . never consistent.

witty-crayon-22786

12/18/2020, 6:20 PM

yea. ok. https://pantsbuild.slack.com/archives/C046T6T9U/p1608314670274500?thread_ts=1608308897.266800&cid=C046T6T9U is the way forward then i think.

salmon-barista-63163

12/18/2020, 6:21 PM

okay. thanks for chatting regarding it.. i was just trying to get any last ditch efforts to try to solve this.

witty-crayon-22786

12/18/2020, 6:21 PM

the effect of that would be significantly reducing the amount of pants runs running at once

salmon-barista-63163

12/18/2020, 6:21 PM

looks like ill have to wait for this

salmon-barista-63163

12/18/2020, 6:21 PM

do you have an estimated ETA for that ticket to complete into a future release? I would like to discuss this with my team.

witty-crayon-22786

12/18/2020, 6:22 PM

the

./pants binary/package && dist/app.pex

change should work in either v1 or v2… are you sure that that isn’t an option?

salmon-barista-63163

12/18/2020, 6:22 PM

yes this works… but we cannot use that everywhere due to the nature of our system and architecture

salmon-barista-63163

12/18/2020, 6:22 PM

I implemented it everywhere i could

witty-crayon-22786

12/18/2020, 6:23 PM

the ~8 clients and ~8 servers are the critical spot i think

salmon-barista-63163

12/18/2020, 6:24 PM

yeah that seems to be what i am seeing as well

witty-crayon-22786

12/18/2020, 6:24 PM

https://pantsbuild.slack.com/archives/C046T6T9U/p1608315718280000?thread_ts=1608308897.266800&cid=C046T6T9U i’ll get you an answer to this in about 2 hours.

witty-crayon-22786

12/18/2020, 6:28 PM

BUT, unfortunately, we definitely won’t be able to backport that change to 2.0.x/2.1.x… it’s built atop a bunch of stuff that only exists in 2.2.x

salmon-barista-63163

12/18/2020, 6:30 PM

okay. thank you

witty-crayon-22786

12/18/2020, 8:14 PM

so, i think that we can commit to having this done in January.

salmon-barista-63163

12/18/2020, 8:22 PM

so we are looking late jan release?

witty-crayon-22786

12/18/2020, 8:22 PM

yea. done and shipped in January.

salmon-barista-63163

12/18/2020, 8:22 PM

ok thank you @witty-crayon-22786

salmon-barista-63163

12/18/2020, 8:23 PM

I will chat with ym team but looks like we are going to halt our rollout of pants 2 until this happens

witty-crayon-22786

12/18/2020, 8:25 PM

i’m a broken record, but i do think that the binary/package approach would work around the issue in this case. but we’ll be prioritizing the pantsd change because i 100% agree that it shouldn’t be necessary to avoid concurrent runs.

happy-kitchen-89482

12/22/2020, 7:29 AM

So we can get some context, can you elaborate on the reason that in some cases you have to run a server with

./pants run <tgt>

and cannot switch to

./pants package <tgt> && ./dist/<tgt>.pex

happy-kitchen-89482

12/22/2020, 7:30 AM

We'd like to understand the use-case better

happy-kitchen-89482

12/22/2020, 7:30 AM

Thanks!

salmon-barista-63163

01/04/2021, 10:47 PM

@happy-kitchen-89482 So we have a server (lets call it the parent) that runs via a pants run process. That server calls subprocesses that will run a “task”. Lets call those child services. Those tasks are executed with a pants run command. We have since migratred those tasks to use a ./pants package and then run the pex subprocess. The issue here is that we can spin up ~30 of tasks which gives us ~30 pants package processes running in parallel. I hope this answers the question as its kind of hard to explain what we use this server for without explaining alot of our product infrastructure.

salmon-barista-63163

01/04/2021, 10:48 PM

I do have a question. Will the fix for pantsd be backported to pants 2.1 or will it only exist in 2.2. We are on 2.0 currently just trying to see what I need to update to in preparation for this release of the bug fix

hundreds-father-404

01/05/2021, 9:48 PM

@witty-crayon-22786 to confirm, we would not backport pantsd support for concurrent runs because it is impossible to backport without breaking the deprecation policy, right?

hundreds-father-404

01/05/2021, 9:50 PM

The issue here is that we can spin up ~30 of tasks which gives us ~30 pants package processes running in parallel.

@salmon-barista-63163 to clarify, it sounds like it is not possible to invoke those all in a single run? Like

./pants package tgt1 tgt2 tgt3

salmon-barista-63163

01/05/2021, 9:53 PM

correct @hundreds-father-404 we do this on the fly as tasks spin up with our application. lwe cannot do all the processes in a single run unfortunately

witty-crayon-22786

01/05/2021, 9:53 PM

no… it would just be a backport of something like 6 or 7 patches. it would be a big investment to ensure that it was stable.

hundreds-father-404

01/05/2021, 9:54 PM

I thought that we must remove all global state to land this change? Meaning

Subsystem.global_instance()

? We can’t backport that because the deprecation policy

8 Views

Open in Slack

Previous Next