< witty crayon 22786> I suspect that the CI timeouts from th Pants #development

<@U06A03HV1> I suspect that the CI timeouts from t...

bitter-ability-32190

12/21/2022, 8:46 PM

@witty-crayon-22786 I suspect that the CI timeouts from the immut PR had to do with contention of some kind. Just running a test alone has it completing in 200s, instead of the 1000s we saw when running all tests

bitter-ability-32190

12/21/2022, 8:50 PM

I was never good at concurrent code, src/rust/engine/fs/store/src/immutable_inputs.rs seems like a likely candidate

enough-analyst-54434

12/22/2022, 5:59 PM

So, at a very high level all requests of an immutable digest block all others, the lock wraps the complete map: https://github.com/pantsbuild/pants/blob/main/src/rust/engine/fs/store/src/immutable_inputs.rs#L65 So that just means if one digest takes 10s to materialize, it will head of line block all other digest requests. Then one of the waiters gets through and its digest is different and it takes 25s, and so on. So folks in the back of the line are waiting 10 + 25 + ...

enough-analyst-54434

12/22/2022, 5:59 PM

There may be more at a finer level, but that high level should explain / account for alot.

enough-analyst-54434

12/22/2022, 6:02 PM

Ah, no. The lock is dropped in the middle of that chained call after grabbing an entry. So the only head of line lock is for all waiters on that single digest. It there is a single digest 100 integration tests wait on though, they all have to wait to start.

bitter-ability-32190

12/22/2022, 6:03 PM

Yeah I had that same chain of thought, although admittedly wasn't confident about it

enough-analyst-54434

12/22/2022, 6:04 PM

really easy is to add some prints, you'll see whose waiting.

bitter-ability-32190

12/22/2022, 6:04 PM

I'd like to blame the digest lock, but only certain tests are taking a long time, and I don't feel confident that those tests, and those tests alone share "biog" digests

enough-analyst-54434

12/22/2022, 6:09 PM

You probably won't like this, but I will say my use of debuggers evaporated once I started having to deal with concurrent code and distributed systems. Prints are always there even if you can't be and they, for the most part, do not rock the thread scheduling boat too much.

bitter-ability-32190

12/22/2022, 6:13 PM

Yeah understandable

bitter-ability-32190

12/22/2022, 9:33 PM

Logging implies we're barking up the wrong tree perhaps. And actually, this kind of makes sense, since what I'm seeing is a few tests absolutely tentpole the hell outta CI. If we were fighting over the resources I'd expect several tests to be having this affliction.

bitter-ability-32190

12/22/2022, 9:34 PM

In the worst offender (pyox related) I see

======================== 2 failed in 1717.01s (0:28:37) ========================

which tells me that pytest is observing the slowdown from end-to-end. So we can rule out anything that happens in Pants before or after it

enough-analyst-54434

12/22/2022, 9:38 PM

But don't we have ITs that call back into Pants via subprocess? This may not be one of those, but that class of test does not prima facie rule out anything Pants internals.

bitter-ability-32190

12/22/2022, 9:39 PM

This is, in fact, one of those kinds of tests.

bitter-ability-32190

12/22/2022, 9:50 PM

(I did print stderr in the test, and had it fail, so we'd see the log lines from the daemon inside the test)

bitter-ability-32190

12/22/2022, 10:40 PM

On a whim I disabled

coverage

and my test time went from >1000s to 500s 🤔

bitter-ability-32190

12/22/2022, 11:04 PM

And down to 170s with no contention+maximum caching

bitter-ability-32190

12/23/2022, 2:59 PM

Ahh a clue! When coverage got turned off, the log from buildizng PyOxy binaries looks normal. When I turn it back on...

000052.97 [INFO] Canceled: Building tmpyyq4pz8q/hellotest:bin with PyOxidizer

000054.02 [INFO] Filesystem changed during run: retrying
Package
in 500ms...

bitter-ability-32190

12/23/2022, 4:11 PM

There it is:

154305.00 [INFO] notify invalidation: cleared 2 and dirtied 36 nodes for: {"src/python/pants/engine/internals", "src/python/pants/engine/internals/native_engine.so"}

bitter-ability-32190

12/23/2022, 4:11 PM

Does adding another hardlink dirty the node?!

bitter-ability-32190

12/23/2022, 4:50 PM

One last change on my PR to log what the notify event is thats being reported

bitter-ability-32190

12/23/2022, 6:37 PM

Modify(Metadata(Any))

I heavly suspect that is being emitted for the hardlink count changing

bitter-ability-32190

12/23/2022, 6:39 PM

``` IN_ATTRIB (*)

Metadata changed--for example, permissions (e.g., chmod(2)), timestamps (e.g., utimensat(2)), extended attributes (setxattr(2)),

link count (since Linux 2.6.25; e.g., for the target of link(2) and for unlink(2)), and user/group ID (e.g., chown(2)).```

bitter-ability-32190

12/23/2022, 6:57 PM

So to wrap this up, I'm wondering if Pants' file watcher really cares about metadata changes? We shouldn't care if the access time or write time changes since we really only care about the name/content, right? https://docs.rs/notify/latest/notify/event/enum.MetadataKind.html

🙌 1

👀 1

🔥 1

enough-analyst-54434

12/23/2022, 7:06 PM

We definitely care about non-exe becoming exe.

enough-analyst-54434

12/23/2022, 7:06 PM

That's the only md I think though.

bitter-ability-32190

12/23/2022, 7:07 PM

That is true. I wonder if we can capture the ones we DO care about in order to ignore the rest (namely

Any

)

bitter-ability-32190

12/23/2022, 7:21 PM

I'll let this PR stew (pun intended) https://github.com/pantsbuild/pants/pull/17875

witty-crayon-22786

12/23/2022, 7:24 PM

Nice work!

witty-crayon-22786

12/23/2022, 7:24 PM

Since this only impacts pants integration tests, I don't think we need a super rigorous solution.

bitter-ability-32190

12/23/2022, 7:27 PM

No, but I also wonder if some of those mysterious invalidation I saw way back in the day were related to this.

witty-crayon-22786

12/23/2022, 7:28 PM

Do you use hard links in your repo?

bitter-ability-32190

12/23/2022, 7:29 PM

Psssssssshhhh who the hell knows what goes on in our mountain of scripts 😂

witty-crayon-22786

12/24/2022, 5:12 AM

🚢 : sorry for the delay.

bitter-ability-32190

12/24/2022, 1:01 PM

You're on 🌴, all good

Open in Slack

Previous Next