I m trying out Pants 2 13 and I m seeing a disappointing per Pants #general

I'm trying out Pants 2.13 and I'm seeing a disappo...

flat-zoo-31952

07/06/2022, 5:25 PM

I'm trying out Pants 2.13 and I'm seeing a disappointing performance regression in my fundamental "let's see what's changed" CI command:

Copy code

./pants --changed-since=HEAD --changed-dependees=transitive list

It's gone from about 1 min to 2 min from 2.11.0 to 2.13.0a0. I can reproduce these times pretty consistently by flipping between branches with the different versions and then killing pantsd and nuking the lmdb_store and named_caches. More details in 🧵

hundreds-father-404

07/06/2022, 5:28 PM

Try 2.13.0a1, and make sure

--owners-not-found-behavior

is not used: https://github.com/pantsbuild/pants/pull/15931

hundreds-father-404

07/06/2022, 5:29 PM

(PS thanks for reporting - it is super helpful to get feedback like this)

💯 1

flat-zoo-31952

07/06/2022, 5:37 PM

Okay I'll give that a try. Running analysis on a related issue where the pantsd cache seems to miss for... no reason. Like I just run the same command (with a

| xargs something-else

) tacked on one line later and that second command has to do the whole dependee check again

hundreds-father-404

07/06/2022, 5:41 PM

I wonder if

xargs

is messing things up. In the past, I vaguely recall it messing up things

flat-zoo-31952

07/06/2022, 5:41 PM

Here's the code that can reproduce that cache miss:

Copy code

time ./pants list --changed-since=HEAD --changed-dependees=transitive > artifacts/transitive.targets.txt
time ./pants --spec-files=artifacts/transitive.targets.txt filter --target-type=python_test --granularity=file '--tag-regex=^unit$' '--address-regex=-^pants-plugins'
time ./pants --spec-files=artifacts/transitive.targets.txt filter --target-type=python_test --granularity=file '--tag-regex=^local$' '--address-regex=-^pants-plugins'
time ./pants --spec-files=artifacts/transitive.targets.txt filter --target-type=python_test --granularity=file '--tag-regex=^systest$' '--address-regex=-^pants-plugins'

time ./pants --changed-since=main --changed-dependees=direct list \
      | xargs ./pants filter --target-type=test_component \
      | xargs ./pants dependees \
      | xargs ./pants filter --tag-regex='^systest-long$' \
      | xargs ./pants source-files

time ./pants --changed-since=main --changed-dependees=direct list \
      | xargs ./pants filter --target-type=test_component \
      | xargs ./pants dependees \
      | xargs ./pants filter --tag-regex='^systest-multicloud$' \
      | xargs ./pants source-files

That last command is exactly like the one before it, except it uses a different tag to filter on (I know filtering is deprecated, haven't gotten around to changing it yet.) But the cache misses completely on

xargs ./pants dependees

in that last command and it has to recompute the entire dep graph (which takes another 2m20s 😅 )

hundreds-father-404

07/06/2022, 5:42 PM

btw in 2.13, no need for

filter

goal -- you can use

--filter-target-type

etc from anywhere 🙂

flat-zoo-31952

07/06/2022, 5:42 PM

Yeah hadn't gotten around to that yet

flat-zoo-31952

07/06/2022, 5:43 PM

obviously i could rewrite this to use a spec file in this case, it's just odd to get such a nasty cache miss

flat-zoo-31952

07/06/2022, 5:44 PM

timings of last two commands:

Copy code

real    0m11.609s
user    0m3.131s
sys     0m0.389s

real    2m44.740s
user    0m3.125s
sys     0m0.389s

witty-crayon-22786

07/06/2022, 5:46 PM

Eric is referring to https://github.com/pantsbuild/pants/pull/15931

witty-crayon-22786

07/06/2022, 5:46 PM

which was cherry-picked for a1.

witty-crayon-22786

07/06/2022, 5:46 PM

so definitely worth trying.

witty-crayon-22786

07/06/2022, 5:47 PM

https://github.com/pantsbuild/pants/issues/15802

flat-zoo-31952

07/06/2022, 5:47 PM

doing so now, just wanted to finish dumping the info on the behavior I observed

👍 1

witty-crayon-22786

07/06/2022, 5:47 PM

yea, thanks: a repro case will be very helpful if there is anything left over

flat-zoo-31952

07/06/2022, 5:54 PM

2.13.0a1 helped a bit but still 😕

Copy code

real    1m27.762s
user    0m0.620s
sys     0m0.076s

And that cache miss happened again

witty-crayon-22786

07/06/2022, 5:55 PM

mm, thanks. please open a ticket with the repro case!

flat-zoo-31952

07/06/2022, 5:55 PM

I dont think I'm using

--owners-not-found-behavior

because I don't specify anything related to it

👍 1

hundreds-father-404

07/06/2022, 5:56 PM

We're definitely interested in figuring that out. I could not personally find a regression between 2.12 and 2.13 last week, so agreed that whatever repro you'd be able to report would help a ton

flat-zoo-31952

07/06/2022, 5:58 PM

It's a regression between 2.11 and 2.13 (I skipped 2.12), and do you have any tips on the repro? I can't ship our whole repository, but I will say that we abuse recursive globs for file targets at the moment. Maybe generating random files would do it. I really don't have a lot of time between now and Fri to come up with a good repro case, but I'll try

witty-crayon-22786

07/06/2022, 6:00 PM

just the commands should be sufficient.

witty-crayon-22786

07/06/2022, 6:01 PM

this type of slowdown tends to be of an algorithmic big-O variety, so it won’t reproduce as strongly in a smaller repo… but it will still reproduce.

flat-zoo-31952

07/06/2022, 6:02 PM

It's those commands I listed above

flat-zoo-31952

07/06/2022, 6:37 PM

So I can reproduce the regression just with

./pants dependees ::

flat-zoo-31952

07/06/2022, 6:38 PM

I've also noticed that I get basically no caching performance gain at all without pantsd for that command, i.e.,

./pants --no-pantsd dependees ::

is consistently slow; I would have expected some of the work would be cached to disk, so I find this surprising

flat-zoo-31952

07/06/2022, 6:39 PM

Copy code

❯ for i in (seq 1 10)
      echo "Trial #$i"
      time ./pants --no-pantsd dependees ::
  end
Trial #1
⠴ 57.09s Searching for `bash` on PATH=/usr/bin:/bin:/usr/local/bin
⠴ 70.27s Map all targets to their dependees

________________________________________________________
Executed in  115.58 secs    fish           external
   usr time  138.44 secs  333.00 micros  138.44 secs
   sys time   15.15 secs   82.00 micros   15.15 secs

Trial #2

________________________________________________________
Executed in   95.70 secs    fish           external
   usr time  114.68 secs    0.00 micros  114.68 secs
   sys time   12.70 secs  533.00 micros   12.70 secs

Trial #3

________________________________________________________
Executed in   90.90 secs    fish           external
   usr time  108.60 secs    0.00 micros  108.60 secs
   sys time   11.85 secs  515.00 micros   11.85 secs

flat-zoo-31952

07/06/2022, 6:39 PM

so there is a bit of a speedup... something is being cached

flat-zoo-31952

07/06/2022, 6:45 PM

okay... I think I may have found the issue: it's probably my (ab)use of recursive globs for target defs. If I kill my big recurisive glob targets and run

./pants tailor

I get a major speedup

Copy code

❯ time ./pants --no-pantsd dependees ::

________________________________________________________
Executed in   50.06 secs    fish           external
   usr time   60.92 secs  769.00 micros   60.92 secs
   sys time    7.21 secs  901.00 micros    7.21 secs

flat-zoo-31952

07/06/2022, 6:53 PM

Anyways... this whole dependency + cache thing is frustrating. I feel like one of the reasons I liked Pants was because of dep inference, but without good caching around it, it almost feels like a solution similar to Bazel + Gazelle makes more sense. I was kinda counting on decent disk-based (or Toolchain-provided remote) caching as a solution to this eventually, but if it requires pantsd to cache this stuff then that's a serious drawback

witty-crayon-22786

07/06/2022, 6:56 PM

there are actual processes that run to extract dependencies on a file-by-file basis: those will be cached to disk across runs, regardless of

pantsd

. but consuming the outputs of those processes to actually construct the graph is only memoized by

pantsd

witty-crayon-22786

07/06/2022, 6:57 PM

bazel/gazelle wouldn’t cache the relevant portion of this either: you can think of the unmemoized portion in

pantsd

as approximately equivalent to “reading the BUILD files from disk”

flat-zoo-31952

07/06/2022, 6:57 PM

why does that take 50 sec?

witty-crayon-22786

07/06/2022, 6:58 PM

and “determining whether gazelle is up to date” is the portion that is cached to disk in pants: i.e., “is the BUILD file updated for the current source file”

flat-zoo-31952

07/06/2022, 6:58 PM

(I'm sorry, I'm just really frustrated today, and I'm gonna get hammered on things like this in the coming weeks as I try to push this project forward)

witty-crayon-22786

07/06/2022, 7:04 PM

why does that take 50 sec?

good question! gathering a trace of the usecase would be useful. Toolchain has infrastructure for collecting the “workunits” via a plugin… in the absence of that, i think that Joshua Cannon had also worked on something to collect and aggregate the workunits

flat-zoo-31952

07/06/2022, 7:05 PM

no wait... it takes 20 sec

./pants --no-pantsd dependees ::

if I get rid of all recursively defined targets and replace them with 111 build file defs

witty-crayon-22786

07/06/2022, 7:07 PM

hm… that’s an interesting datapoint. if anything, i would think that fewer targets would take less time to match in that case. worth following up on.

➕ 1

flat-zoo-31952

07/06/2022, 7:07 PM

so that's 2 min to 20 sec, and probably something that points to a superlinear big-O thing on "number of targets per generator/owner"

➕ 1

witty-crayon-22786

07/06/2022, 7:07 PM

yea. all of this is stuff that we’re very well positioned to investigate in 2.13, because this code has changed quite a bit

flat-zoo-31952

07/06/2022, 7:08 PM

I'll still write a bug on this and try to write this up

witty-crayon-22786

07/06/2022, 7:08 PM

yes please. these will be blocking issue(s) for 2.13.

flat-zoo-31952

07/06/2022, 7:09 PM

i would think that fewer targets would take less time to match in that case. worth following up on.

I didn't have fewer targets, I had fewer owners. I was using a bunch of

sources=["dir/**/*.py"]

👀 1

happy-kitchen-89482

07/06/2022, 7:10 PM

maybe we're overevaluating globs or something

happy-kitchen-89482

07/06/2022, 7:10 PM

@flat-zoo-31952 we could revisit getting you set up with Toolchain's buildsense UI, because that would give us very fine-grained traces we can use to debug this

flat-zoo-31952

07/06/2022, 7:11 PM

Sounds nice, but I don't have the time to push through a 3rd-party vendor integration right now (through security concerns, politics, etc); would love to discuss that eventually

flat-zoo-31952

07/06/2022, 7:14 PM

I might try 2.12 as well, but I originally went to the 2.13 branch because I was able to build it for ARM. I can probably cherry pick some things to a forked 2.12 branch to make ARM buildable on 2.12. I'll talk with my team tomorrow and see what we can do

witty-crayon-22786

07/06/2022, 7:16 PM

yea, possibly. if you encounter CLI arg performance issues on 2.12 though, it will be quite difficult to cherry-pick fixes there. 2.13 will be a much easier target for fixes.

witty-crayon-22786

07/06/2022, 7:17 PM

(because these codepaths changed quite a bit between 2.12 and 2.13)

flat-zoo-31952

07/06/2022, 7:18 PM

I'll consider that. The performance is so much better with 111 .BUILD files that I may just try to either 1) ignore the performance issue or 2) ignore ARM and stick with 2.11 ... until a few weeks from now when I can put .BUILD files in the repo everywhere

flat-zoo-31952

07/06/2022, 7:20 PM

Using one giant .BUILD file is bad but I need

./pants tailor

working locally on all dev machines before I can get rid of it

👍 1

witty-crayon-22786

07/06/2022, 8:30 PM

yea, that’s something we’d be interested in looking at too. we can tackle these with some additional urgency: we definitely consider them to be broken windows.

witty-crayon-22786

07/06/2022, 8:30 PM

filing the one(s) that are blocking you would be helpful.

🙏🏻 1

🙏 1

flat-zoo-31952

07/07/2022, 1:19 PM

I will try to file these soon. I'm just slammed this week (trying to actually get Pants running on 10% of our pipelines before I leave on vacation, and trying to tie up a bunch of loose ends to do this)

🤞 1

witty-crayon-22786

07/07/2022, 5:11 PM

good luck! let us know how we can help.

flat-zoo-31952

07/10/2022, 6:08 PM

So I did some playing around with this, including trying to randomly generate some files, but I'm not able to reproduce this issue outside our actual repository. I've tried randomly generating large source trees to simulate this, but it hasn't worked. If I can't make a reproducer outside of a repo that I can't share, I'm not sure what good a bug report will do 😕

flat-zoo-31952

07/10/2022, 6:34 PM

-ldebug

has some relevant info... this is an excerpt, looks to me like it's stalling on getting a sandbox to look for python binaries?

Copy code

14:24:36.68 [DEBUG] Launching 1 roots (poll=false).
14:24:37.83 [DEBUG] Completed: Find all targets in the project
14:24:47.25 [DEBUG] Completed: acquire_command_runner_slot
14:24:47.25 [DEBUG] Running Searching for `python3.10` on PATH=/usr/bin under semaphore with concurrency id: 5, and concurrency: 1
14:24:47.25 [DEBUG] Completed: acquire_command_runner_slot
14:24:47.25 [DEBUG] Running Searching for `python3.9` on PATH=/usr/bin under semaphore with concurrency id: 6, and concurrency: 1
14:24:47.26 [DEBUG] Completed: Extracting an archive file
14:24:47.26 [DEBUG] Completed: pants.core.util_rules.external_tool.download_external_tool
14:25:48.97 [DEBUG] Completed: setup_sandbox
14:25:48.97 [DEBUG] Completed: setup_sandbox
14:25:48.97 [DEBUG] Obtaining exclusive spawn lock for process since we materialized its executable Some("./find_binary.sh").
14:25:48.97 [DEBUG] Obtaining exclusive spawn lock for process since we materialized its executable Some("./find_binary.sh").
14:25:57.63 [DEBUG] spawned local process as Some(122486) for Process { argv: ["./find_binary.sh", "python3.10"], env: {"PATH": "/usr/bin"}, working_directory: None, input_digests: InputDigests { complete: DirectoryDigest { digest: Digest { hash: Fingerprint<1824155fc3b856540105ddc768220126b3d9e72531f69c45e3976178373328f3>, size_bytes: 91 }, tree: "Some(..)" }, nailgun: DirectoryDigest { digest: Digest { hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>, size_bytes: 0 }, tree: "Some(..)" }, input_files: DirectoryDigest { digest: Digest { hash: Fingerprint<1824155fc3b856540105ddc768220126b3d9e72531f69c45e3976178373328f3>, size_bytes: 91 }, tree: "Some(..)" }, immutable_inputs: {}, use_nailgun: {} }, output_files: {}, output_directories: {}, timeout: None, execution_slot_variable: None, concurrency_available: 0, description: "Searching for `python3.10` on PATH=/usr/bin", level: Debug, append_only_caches: {}, jdk_home: None, platform_constraint: None, cache_scope: PerRestartSuccessful }
14:25:57.86 [DEBUG] spawned local process as Some(122487) for Process { argv: ["./find_binary.sh", "python3.9"], env: {"PATH": "/usr/bin"}, working_directory: None, input_digests: InputDigests { complete: DirectoryDigest { digest: Digest { hash: Fingerprint<1824155fc3b856540105ddc768220126b3d9e72531f69c45e3976178373328f3>, size_bytes: 91 }, tree: "Some(..)" }, nailgun: DirectoryDigest { digest: Digest { hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>, size_bytes: 0 }, tree: "Some(..)" }, input_files: DirectoryDigest { digest: Digest { hash: Fingerprint<1824155fc3b856540105ddc768220126b3d9e72531f69c45e3976178373328f3>, size_bytes: 91 }, tree: "Some(..)" }, immutable_inputs: {}, use_nailgun: {} }, output_files: {}, output_directories: {}, timeout: None, execution_slot_variable: None, concurrency_available: 0, description: "Searching for `python3.9` on PATH=/usr/bin", level: Debug, append_only_caches: {}, jdk_home: None, platform_constraint: None, cache_scope: PerRestartSuccessful }
14:25:58.23 [DEBUG] Completed: Searching for `python3.9` on PATH=/usr/bin
14:25:58.23 [DEBUG] Completed: Searching for `python3.10` on PATH=/usr/bin
14:25:58.23 [DEBUG] Completed: Scheduling: Searching for `python3.10` on PATH=/usr/bin
14:25:58.23 [DEBUG] Completed: Scheduling: Searching for `python3.9` on PATH=/usr/bin

hundreds-father-404

07/10/2022, 6:35 PM

oh that's an interesting angle, if the slowdown isn't from our Specs changes but instead from something else

flat-zoo-31952

07/10/2022, 6:36 PM

There's like a gap of a minute in there where it's not doing... anything? Possible race condition? This is all before the dep calculation starts, too, and dep calc performance is actually really reasonable according to these logs

👍 1

hundreds-father-404

07/10/2022, 6:37 PM

Are you able to share the logs as a gist? Fine if over DM. (Although probably not to me, I'm out all week finishing a move)

flat-zoo-31952

07/10/2022, 6:37 PM

yeah I can scrub them a bit and share them as a gist

flat-zoo-31952

07/10/2022, 6:47 PM

i'm curious, is there a way to configure logging to send different levels to different streams/files?

flat-zoo-31952

07/10/2022, 7:11 PM

now I'm running into what looks like a hang when I attempt to redirect stderr to collect the logs, either to a stream or to a file

flat-zoo-31952

07/10/2022, 9:48 PM

https://github.com/pantsbuild/pants/issues/16121 for the log hanging issue

witty-crayon-22786

07/10/2022, 10:05 PM

@flat-zoo-31952: it’s possible that piping into a file that

pantsd

is watching is causing it to constantly restart

witty-crayon-22786

07/10/2022, 10:05 PM

(or is this

--no-pantsd

…?)

flat-zoo-31952

07/10/2022, 10:06 PM

it is pantsd, but I've reproduced it with a file outside the tree (I think)

witty-crayon-22786

07/10/2022, 10:06 PM

i’ll try to repro. but maybe pipe into an ignored location like

.pants.d

dist

or outside of the tree

flat-zoo-31952

07/10/2022, 10:07 PM

can repro with

2> /var/tmp/some-file.log

witty-crayon-22786

07/10/2022, 10:08 PM

yea, doesn’t look like an invalidation issue.

flat-zoo-31952

07/10/2022, 10:09 PM

https://github.com/pantsbuild/pants/issues/16122 for the performance issue

flat-zoo-31952

07/10/2022, 10:09 PM

I can't repro outside our repository but i'm adding debug and trace level logs

flat-zoo-31952

07/10/2022, 10:13 PM

Let me know if there's something else I can do

witty-crayon-22786

07/10/2022, 10:14 PM

given that the deadlock is just in the context of trying to debug the actual performance issue, i’ll probably start by looking at the performance issue. sorry for all of the trouble!

flat-zoo-31952

07/10/2022, 10:18 PM

I can repro the deadlock on stderr redirection in the pants repo itself

flat-zoo-31952

07/10/2022, 10:19 PM

I haven't been able to repro the performance issue there, so I wouldn't assume they're related

flat-zoo-31952

07/10/2022, 10:20 PM

Thanks for looking into it, sorry it took me a bit to get proper bug reports

Open in Slack

Previous Next