Hi again I m planning to upstream today a little backend too Pants #development

Hi again! I'm planning to upstream today a little ...

hundreds-father-404

02/24/2021, 9:18 PM

Hi again! I'm planning to upstream today a little backend toolchain made to aggregate all counters and histograms, then print them to the console at the end of the run. I think this should always be activated, and toggled with a bool option. What do you think for names?

Copy code

--stats-collector-enable
--stats-aggregator-enable
--debug-stats-enable

enough-analyst-54434

02/24/2021, 9:32 PM

This used to roughly be

-X

for historical context.

👍 1

hundreds-father-404

02/24/2021, 9:33 PM

Copy code

register('-x', '--time', type=bool,
             help='Output a timing report at the end of the run.')

happy-kitchen-89482

02/24/2021, 10:15 PM

--stats ? Something pithy would be good

hundreds-father-404

02/24/2021, 10:17 PM

@happy-kitchen-89482 as a global option? I'd prefer we do a subsystem I think, in part so we can add things in the future like a

--no-histograms

option (they're super chatty)

happy-kitchen-89482

02/24/2021, 10:17 PM

--stats-on

happy-kitchen-89482

02/24/2021, 10:17 PM

--stats-show

hundreds-father-404

02/24/2021, 10:25 PM

--stats-show

--exec-stats-show

? I get nervous claiming

[stats]

hundreds-father-404

02/24/2021, 10:27 PM

--exec-stats-log

happy-kitchen-89482

02/24/2021, 10:27 PM

I'd rather err on the side of clarity

happy-kitchen-89482

02/24/2021, 10:27 PM

what other stats will there be?

happy-kitchen-89482

02/24/2021, 10:28 PM

But why is this a plugin and not in core?

hundreds-father-404

02/24/2021, 10:28 PM

So far it's this:

Copy code

14:59:37.98 [INFO] Counters:
  local_cache_read_errors: 0
  local_cache_requests: 31
  local_cache_requests_cached: 8
  local_cache_requests_uncached: 23
  local_cache_write_errors: 0
  local_execution_requests: 23
  remote_cache_read_errors: 0
  remote_cache_requests: 0
  remote_cache_requests_cached: 0
  remote_cache_requests_uncached: 0
  remote_cache_speculation_local_completed_first: 0
  remote_cache_speculation_remote_completed_first: 0
  remote_cache_write_errors: 0
  remote_cache_write_finished: 0
  remote_cache_write_started: 0
  remote_execution_errors: 0
  remote_execution_requests: 0
  remote_execution_rpc_errors: 0
  remote_execution_rpc_execute: 0
  remote_execution_rpc_retries: 0
  remote_execution_rpc_wait_execution: 0
  remote_execution_success: 0
  remote_execution_timeouts: 0

hundreds-father-404

02/24/2021, 10:28 PM

It is in core, always activated! But it's a subsystem rather than global options:

I'd prefer we do a subsystem I think, in part so we can add things in the future like a --no-histograms option (they're super chatty)

hundreds-father-404

02/24/2021, 11:23 PM

https://github.com/pantsbuild/pants/pull/11599

happy-kitchen-89482

02/24/2021, 11:42 PM

Gotcha, my misunderstanding

happy-kitchen-89482

02/24/2021, 11:56 PM

Nonetheless, I'd prefer an easier scope than

exec-stats

, I think just

stats

is fine?

hundreds-father-404

02/24/2021, 11:57 PM

That precludes the global option

--stats

, which I'm fine with. But does that mess up something like

[black].stats

? I don't recall the rules. (I'll test if you don't remember)

hundreds-father-404

02/25/2021, 2:19 AM

@fast-nail-55400 is this what you're thinking for the histogram summary?

Copy code

19:18:51.74 [INFO] Histogram for `local_store_read_blob_size`:
  min: 10
  max: 3069951
  mean: 115912.684
  std dev: 515510.627
  p25: 79
  p50: 91
  p75: 166
  p90: 1644
  p95: 468479
  p99: 3069951

Anything else pertinent like "num buckets"? cc @witty-crayon-22786

fast-nail-55400

02/25/2021, 2:21 AM

is median available?

fast-nail-55400

02/25/2021, 2:21 AM

nm that’s just P50 technically

👍 1

fast-nail-55400

02/25/2021, 2:22 AM

total number of samples would be useful

👍 1

fast-nail-55400

02/25/2021, 2:22 AM

but otherwise lgtm

fast-nail-55400

02/25/2021, 2:23 AM

also “Histogram for” should probably be “Summary of” since this no longer outputs a histogram per se

👍 1

hundreds-father-404

02/25/2021, 2:24 AM

Copy code

19:24:27.52 [INFO] Summary of `local_store_read_blob_size` observation histogram:
  min: 10
  max: 3069951
  mean: 115914.174
  std dev: 515510.321
  total observations: 190
  p25: 79
  p50: 89
  p75: 166
  p90: 1644
  p95: 468479
  p99: 3069951

witty-crayon-22786

02/25/2021, 2:37 AM

Would drop std dev, but other than that looks good!

hundreds-father-404

02/25/2021, 2:38 AM

how come? hdrhistogram's summary included it. (I have no opinion)

witty-crayon-22786

02/25/2021, 2:38 AM

And would continue to vote for just inlining into the counter values with a compound key:

Copy code

19:24:27.52 [INFO] local_store_read_blob_size:p50: 89

👀 1

witty-crayon-22786

02/25/2021, 2:39 AM

Um... Just that I've never found it useful?

hundreds-father-404

02/25/2021, 2:45 AM

And would continue to vote for just inlining into the counter values with a compound key:

I think that gets too noisy. We have 28 counters, each histogram has 11 values associated with it. So adding 1 histogram means now 39 values in a flat list, ~30% corresponding to only a single metric

Open in Slack

Previous Next