pantsd and performance re <https github com pantsbuild pants Pants #development

Join Slack

pantsd and performance!: re <https://github.com/pa...

# development

witty-crayon-22786

03/24/2020, 4:20 PM

pantsd and performance!: re https://github.com/pantsbuild/pants/pull/9318

witty-crayon-22786

03/24/2020, 4:21 PM

@early-needle-54791 : It looks like there might be an issue with the first line for master... that should be

list ::

, yea?

witty-crayon-22786

03/24/2020, 4:22 PM

list ::

for master/branch ends up with a similar ratio to the others (around 14%?), I think that this is good to go from a pantsd users (ie, Twitter) perspective... and if it doesn't, then you might go looking for more things that we're watching unexpectedly

witty-crayon-22786

03/24/2020, 4:22 PM

But we should talk to Toolchain about whether they are on a path to enabling pantsd for their users, and thus willing to take this hit. If they're not, I think we'd need to put it behind a flag that disabled it unless pantsd was enabled.

hundreds-father-404

03/24/2020, 4:34 PM

Awesome! Is the goal to completely remove watchman in favor of this?

witty-crayon-22786

03/24/2020, 4:34 PM

yep.

💯 2

witty-crayon-22786

03/24/2020, 4:35 PM

cc @happy-kitchen-89482, @enough-analyst-54434: ^

witty-crayon-22786

03/24/2020, 4:37 PM

i can summarize the question / expected path forward on https://github.com/pantsbuild/pants/issues/4999 if that would be helpful.

👍 1

happy-kitchen-89482

03/24/2020, 5:45 PM

That would be helpful, thanks!

witty-crayon-22786

03/24/2020, 5:48 PM

@early-needle-54791: i expect that you're hackweeking, but i'll be in meetings for the next two hours or so. if you get a chance to summarize the question and next steps on #4999 before i do, please do.

witty-crayon-22786

03/24/2020, 5:48 PM

and if you're hackweeking, don't worry about it

witty-crayon-22786

03/24/2020, 5:52 PM

it's possible that the answer is to do what the #4999 description describes, and add flags that allow for enabling 1) watchman only, 2) watchman+notify, 3) notify only ... in which case the next question will be "in this PR, or in the followup PR?"

early-needle-54791

03/24/2020, 6:25 PM

Taking a stab at summarizing the next steps / open questions.

❤️ 1

witty-crayon-22786

03/24/2020, 9:59 PM

@early-needle-54791: hey, still not quite sure about that benchmark result. i see a ratio of 2.8 seconds for

contrib

to 4.8 seconds for

::

in master, so it doesn't seem right that you were seeing

11.64s

for

contrib

and

11.57s

for

::

witty-crayon-22786

03/24/2020, 9:59 PM

and at an intuitive level, it doesn't really make sense

early-needle-54791

03/24/2020, 9:59 PM

you mean on master?

witty-crayon-22786

03/24/2020, 9:59 PM

yes

witty-crayon-22786

03/24/2020, 9:59 PM

(for single runs, but i expect it to scale up to 5 runs)

early-needle-54791

03/24/2020, 10:00 PM

my hypothesis was just that without notify everything is dominated by startup time

early-needle-54791

03/24/2020, 10:00 PM

and the scandir time is completely negligible

witty-crayon-22786

03/24/2020, 10:00 PM

would you mind re-running the "5 runs of

list ::

on master" line?

early-needle-54791

03/24/2020, 10:00 PM

sure

witty-crayon-22786

03/24/2020, 10:00 PM

it isn't on my machine, heh

early-needle-54791

03/24/2020, 10:01 PM

master isnt on ur machine?

witty-crayon-22786

03/24/2020, 10:01 PM

https://pantsbuild.slack.com/archives/C0D7TNJHL/p1585087219163100?thread_ts=1585066859.157400&cid=C0D7TNJHL

witty-crayon-22786

03/24/2020, 10:02 PM

^ that is not the case on my machine for master

witty-crayon-22786

03/24/2020, 10:02 PM

listing something smaller takes less time, afaict.

early-needle-54791

03/24/2020, 10:02 PM

early-needle-54791

03/24/2020, 10:03 PM

i may have had the --{no-}enable-pantsd removed 🤦‍♂️

early-needle-54791

03/24/2020, 10:03 PM

eh still its saying 18 seconds for 5 laps of ::

early-needle-54791

03/24/2020, 10:03 PM

( repeat 5; do; ./v2 --no-enable-pantsd list ::; done; ) 20.89s user 9.56s system 165% cpu 18.433 total

early-needle-54791

03/24/2020, 10:03 PM

lets try src/:: again

early-needle-54791

03/24/2020, 10:04 PM

errr

( repeat 5; do; ./v2 --no-enable-pantsd list src/::; done; )  13.20s user 5.71s system 131% cpu 14.400 total

early-needle-54791

03/24/2020, 10:04 PM

about 1 sec faster per run

early-needle-54791

03/24/2020, 10:05 PM

to limit it to src/::

early-needle-54791

03/24/2020, 10:06 PM

But you’re right the numbers up there seem inconsistent w/r/t ::

witty-crayon-22786

03/24/2020, 10:07 PM

hm. i'll run the whole thing.

early-needle-54791

03/24/2020, 10:08 PM

better CPU usage tho, so probably better parallelism not waiting for the lock on the watcher

early-needle-54791

03/24/2020, 10:17 PM

Also, nothing is jumping out at me that should obviously be ignored, or has enough files to even matter.

witty-crayon-22786

03/24/2020, 11:31 PM

so... the notify crate appears to be depressingly inefficient.

😢 1

witty-crayon-22786

03/24/2020, 11:47 PM

(...at least on OSX...)

witty-crayon-22786

03/24/2020, 11:47 PM

it starts and stops an event loop every time you watch something new. argh.

enough-analyst-54434

03/24/2020, 11:50 PM

Are we "using it wrong" aka can you just watch the buildroot wholesale and then filter events you don't want from there?

👍 2

witty-crayon-22786

03/24/2020, 11:54 PM

possibly, yea. it certainly seems like the expectation for OSX was not that folks would add files to watch individually. ... on Linux it looks like it should be much more performant.

witty-crayon-22786

03/24/2020, 11:55 PM

compare and contrast linux: https://github.com/notify-rs/notify/blob/eed64ac9088ec1aab5c4710ef7232f0fbee49e0a/src/inotify.rs#L527-L540 and osx: https://github.com/notify-rs/notify/blob/eed64ac9088ec1aab5c4710ef7232f0fbee49e0a/src/fsevent.rs#L229-L235

witty-crayon-22786

03/24/2020, 11:57 PM

so, next question is how the recursive mode is implemented.

witty-crayon-22786

03/25/2020, 12:02 AM

...it's implemented by walking the filesystem and watching everything for linux. on OSX it looks fsevents is per-filesystem or per-disk... so recursive is cheap.

witty-crayon-22786

03/25/2020, 12:05 AM

interesting. so... i think that for this to not suck we'd need different strategies per OS. that's unfortunate, but ... possible.

witty-crayon-22786

03/25/2020, 12:18 AM

@early-needle-54791: so: sorry for the goose chase. based on the above, i think that where we are on this is that we need per-OS behavior, akin to https://github.com/notify-rs/notify/blob/eed64ac9088ec1aab5c4710ef7232f0fbee49e0a/src/lib.rs#L186-L191

witty-crayon-22786

03/25/2020, 12:19 AM

on OSX, it's cheap to watch the whole thing, so we'd want to recursively watch the root, period. and on Linux we'd want to continue to use the implementation we have now.

witty-crayon-22786

03/25/2020, 12:20 AM

i'm confirming the OSX aspect now.

witty-crayon-22786

03/25/2020, 12:26 AM

...yea, confirmed. the recursive watch is cheap on OSX.

witty-crayon-22786

03/25/2020, 12:38 AM

so, as awkward as that is, i think that it is a clear path forward. and just a matter of trying to contain the logic to

<http://watch.rs|watch.rs>

to the extent possible.

👍 1

early-needle-54791

03/25/2020, 1:16 AM

Okay that makes sense. So on OSX we would effectively ignore calls to watch() at runtime, and just watch the build_root once at startup?

early-needle-54791

03/25/2020, 1:16 AM

seems like that wouldn’t be too much trouble.

witty-crayon-22786

03/25/2020, 1:20 AM

Yep

early-needle-54791

03/25/2020, 1:35 AM

I had been going over the inotify.rs module looking for where it was spending so much time, but of course its in the fsevent.rs module 🤦‍♂️

early-needle-54791

03/25/2020, 1:37 AM

The only reason I can think of that they would be implemented so differently is because of limit/strengths of the underlying APIs

witty-crayon-22786

03/25/2020, 1:57 AM

Yep. For sure. Fsevents on osx is "per filesystem or drive", so recursive is just filtering the relevant stuff out ... while Linux is per path

👍 1

early-needle-54791

03/25/2020, 2:13 AM

new timings on the feature branch after recursive watch

( repeat 5; do; ./v2 -lerror --no-enable-pantsd list ::; done; )  18.61s user 7.07s system 148% cpu 17.233 total

🔥 1

early-needle-54791

03/25/2020, 2:14 AM

😅

early-needle-54791

03/25/2020, 2:14 AM

thanks for the help stu

❤️ 1

early-needle-54791

03/25/2020, 2:16 AM

Side note, can you guys think of a reason we would want to watch hidden files and directories in the build root? only think I can think of is config files ?

early-needle-54791

03/25/2020, 2:17 AM

ignoring events from hidden files would help avoid unncessary full build root invalidations. i.e when something gets added .pants.d or w/e

early-needle-54791

03/25/2020, 2:18 AM

I’m going to leave it as is watching hidden files, and we can focus on optimizing invalidation later.

early-needle-54791

03/25/2020, 2:19 AM

This probably needs some more thorough discussion.

👍 1

witty-crayon-22786

03/25/2020, 5:18 PM

re: skipping invalidation for ignored files: ok with it going in an immediate followup. but i think that it will actually need to be immediate, because "an invalidation event for anything under `.pantsd`"... is going to be very significant.

👍 1

witty-crayon-22786

03/25/2020, 5:19 PM

but it should be based on

pants_ignore

rather than heuristics about hidden-ness

👍 1

early-needle-54791

03/25/2020, 5:20 PM

files under .pants.d shouldn’t invalidate the build_root, and should match any node fs_subjects tho.

early-needle-54791

03/25/2020, 5:21 PM

I agree it needs to be fixed btw.

witty-crayon-22786

03/25/2020, 5:21 PM

understood. but there will be hundreds of thousands of events arriving during v1 builds.

early-needle-54791

03/25/2020, 5:21 PM

yeah….

witty-crayon-22786

03/25/2020, 5:21 PM

and the current implementation of

Graph::invalidate_from_roots

is ... not optimized for that.

early-needle-54791

03/25/2020, 5:22 PM

Good point. Idk, maybe I should fix this now then

witty-crayon-22786

03/25/2020, 5:22 PM

and shouldn't need to be, really... skipping events that can't be relevant makes sense.

early-needle-54791

03/25/2020, 5:23 PM

Do you think its safe to do in a follow up then?

early-needle-54791

03/25/2020, 5:23 PM

how many people are running v1 compiles from bleeding master, very few I would assume, except for where v1 is used in CI

witty-crayon-22786

03/25/2020, 5:24 PM

@early-needle-54791: it might step on Greg's toes a bit (via https://github.com/pantsbuild/pants/pull/9310), but the way to do that i think would be to move calculation of our ignore patterns out of here: https://github.com/pantsbuild/pants/blob/45e3f0ea5e69fb6a59eed02c6474cb87c18001ba/src/rust/engine/fs/src/lib.rs#L481-L508 ... and up a level to context/Core, i think

hundreds-father-404

03/25/2020, 5:24 PM

One pattern I like to do is to have the followup immediately queued so that you can put it up right away. That way you still can avoid scope creep in the original PR

early-needle-54791

03/25/2020, 5:25 PM

Yeah that makes sense, and thats what I want to avoid.

witty-crayon-22786

03/25/2020, 5:25 PM

yea. Eric's suggestion is good. let's get that drafted as a followup maybe.

early-needle-54791

03/25/2020, 5:25 PM

I’d also really like to avoid having to rebase the PR again…

witty-crayon-22786

03/25/2020, 5:25 PM

yep.

early-needle-54791

03/25/2020, 5:26 PM

which to me means merging sooner rather than later. Not sure how long the proper ignore PR will take.

witty-crayon-22786

03/25/2020, 5:27 PM

if it will take a while, what you might deal with instead is a revert and re-land

early-needle-54791

03/25/2020, 5:28 PM

thats true.

witty-crayon-22786

03/25/2020, 5:28 PM

so, i think that this is a spectrum.

witty-crayon-22786

03/25/2020, 5:29 PM

we don't have to wait until the ignore patch is perfect.

witty-crayon-22786

03/25/2020, 5:29 PM

can dial that knob.

hundreds-father-404

03/25/2020, 5:29 PM

Do you expect merge conflicts the next few hours? Would probably be good to draft the followup before merging so that you have a solid idea of the scope. Maybe, that followup ends up being way more complex than you thought and would take 3 days so you have to revert, which isn’t ideal Nb that John has been planning a release for a few days (blocked by Pex upgrade which he’s close to landing), and I’m on release duty on Friday. This would need to be fixed before the release. If John releases today, for example, that would mean reverting your PR today. If we wait til Friday, you would have 3 days (We’re 5 days behind on the release already, so we probably shouldn’t wait too much longer)

witty-crayon-22786

03/25/2020, 5:29 PM

but i think that there might actually be a significant risk of this being a bottleneck.

👍 1

early-needle-54791

03/25/2020, 5:32 PM

So waiting for the --pants-gitignore pr seems like the right option in the long run, but in lieu of blocking on that this is going to use the pants-ignore option as a first pass toward getting it right?

early-needle-54791

03/25/2020, 5:33 PM

And if the --pants-gitignore is going to do the right thing eventually can we eliminate ~90% of the problem by naively ignoring .pants.d ?

witty-crayon-22786

03/25/2020, 5:34 PM

@early-needle-54791: oh, sorry: to be clear, i think that this is orthogonal to whether pants itself propagates gitignore patterns into what it ignores (beyond the fact that it is likely to conflict)

early-needle-54791

03/25/2020, 5:34 PM

I’d rather not duplicate work / step on toes if I don’t have to.

witty-crayon-22786

03/25/2020, 5:34 PM

pants_ignore is absolutely/already the thing to use

👍 1

Open in Slack

Previous Next