I m thinking about how to make pants notion of Workunits mor Pants #development

I'm thinking about how to make pants' notion of Wo...

hundreds-breakfast-49010

03/31/2020, 1:39 AM

I'm thinking about how to make pants' notion of Workunits more sophisticated/have more information on them, and I want to make sure that I understand how to correctly measure the time it takes pants engine operations to run

hundreds-breakfast-49010

03/31/2020, 1:40 AM

especially since some of this code has changed recently with @witty-crayon-22786’s work on upgrading

tokio

and using

async

syntax

hundreds-breakfast-49010

03/31/2020, 1:41 AM

my high-level goal here is that toolchain wants to be sure that the information contained in the JSON representation of a workunit, particularly timestamps that purport to show how long it takes a workunit to run, are useful/actionable

hundreds-breakfast-49010

03/31/2020, 1:43 AM

I'm not actually sure if we're computing the duration of a workunit correctly right now

hundreds-breakfast-49010

03/31/2020, 1:43 AM

the way we currently do this is, in

NodeKey::run

, we grab

std::time::SystemTime::now()

and make a

StartedWorkunit

from that

hundreds-breakfast-49010

03/31/2020, 1:44 AM

then we move that data structure into the

async move

block

hundreds-breakfast-49010

03/31/2020, 1:46 AM

and then, after this block: https://github.com/pantsbuild/pants/blob/master/src/rust/engine/src/nodes.rs#L1084 , which has a bunch of

.await

calls, we inspect the

StartedWorkunit

again, and transform it into a

Workunit

we put into the store

hundreds-breakfast-49010

03/31/2020, 1:47 AM

which involves calling this

finish

method: https://github.com/pantsbuild/pants/blob/master/src/rust/engine/workunit_store/src/lib.rs#L54, which computes

TimeSpan::since(start_time)

hundreds-breakfast-49010

03/31/2020, 1:47 AM

so, based on my understanding of rust futures, I'm not sure why this works. but it does seem to work

hundreds-breakfast-49010

03/31/2020, 1:49 AM

that is, it seems like the code we invoke after the

.await

methods have been called in the

async move

block is happening some amount of real time after the code earlier in

run

where we originally computed

start_time

, so the

time_span

is measuring something that takes a human-noticeable amount of time to run

hundreds-breakfast-49010

03/31/2020, 1:49 AM

it's not like these are microseconds apart

hundreds-breakfast-49010

03/31/2020, 2:02 AM

another question I have - let's say that I wanted to accurately measure start and end times for time spent executing an external process

hundreds-breakfast-49010

03/31/2020, 2:03 AM

I think that one way I could do that is, analogously to how the

NodeKey::run

code computes a start time outside of an

async

block, then computes an end time within that block, but after an

.await

call, and then subtracts them

hundreds-breakfast-49010

03/31/2020, 2:05 AM

I could rewrite some of the futures-using code in

process_execution

(maybe

process_execution/local.rs::run_and_capture_workdir

?) to use async/await syntax, and insert time-recording and time-difference-recording amidst that syntax in those files, and get a useful time difference?

hundreds-breakfast-49010

03/31/2020, 2:06 AM

@average-vr-56795 @aloof-angle-91616 if you have any thoughts about how best to approach this problem I'd be interested in hearing them as well

witty-crayon-22786

03/31/2020, 2:07 AM

async syntax is equivalent to combinators in terms of how it is executed.

witty-crayon-22786

03/31/2020, 2:10 AM

so the introduction of async/await should not have changed the behavior of that code.

witty-crayon-22786

03/31/2020, 2:12 AM

i don't really have a great idea of how to explain it. but along with the introduction of async/await, the rust folks published a book: https://rust-lang.github.io/async-book/01_getting_started/01_chapter.html

aloof-angle-91616

03/31/2020, 2:13 AM

i was just reading this right now from hacker news: https://os.phil-opp.com/async-await/

hundreds-breakfast-49010

03/31/2020, 2:16 AM

@aloof-angle-91616 yeah I saw that article myself, I was meaning to read it

hundreds-breakfast-49010

03/31/2020, 2:19 AM

@witty-crayon-22786 maybe a better question is, are we sure that the code that currently computes the duration of a workunit is "doing the right thing"?

hundreds-breakfast-49010

03/31/2020, 2:20 AM

in the sense of, measuring the amount of time that pants is spending doing useful work on the workunit the code puts the timestamp on?

hundreds-breakfast-49010

03/31/2020, 2:21 AM

I believe that before it was rewritten to use the

.await

syntax the code that computed the end time was in an

.inspect

future that was placed after that main future that delegated to each variant of `NodeKey`'s

run

method

witty-crayon-22786

03/31/2020, 2:24 AM

the async/await code is equivalent to that.

witty-crayon-22786

03/31/2020, 2:26 AM

https://pantsbuild.slack.com/archives/C0D7TNJHL/p1585621222057200?thread_ts=1585618751.053000&cid=C0D7TNJHL it has always been the case that this is challenging for async code, because every time a future suspends, it is "doing nothing" until it is awoken again. this is similar to "blocking" in blocking code.

witty-crayon-22786

03/31/2020, 2:26 AM

do you blame a task for the time that it spends waiting for other things? yes, probably. but it's more challenging when you have thousands of async things happening at once.

witty-crayon-22786

03/31/2020, 2:27 AM

we've had that particular discussion before. and it lead to talking about rendering only processes in the CLI ui

witty-crayon-22786

03/31/2020, 2:27 AM

in zipkin-like workunit spans, it's basically the right thing.

👍 1

hundreds-breakfast-49010

03/31/2020, 2:30 AM

@witty-crayon-22786 actually part of my confusion here is that, even if we assume that we only want to render the time it takes for a process to run in the CLI UI, I'm not sure what I would do differently from the timing code associated with the workunit, to get that number

hundreds-breakfast-49010

03/31/2020, 2:31 AM

we use an async library for executing an external process, rather than the std lib

Command

data structure

witty-crayon-22786

03/31/2020, 2:31 AM

do it inside the lock. we talked about this in DMs

hundreds-breakfast-49010

03/31/2020, 2:35 AM

by "inside the lock", do you mean modifying

AsyncSempahore::with_acquired

witty-crayon-22786

03/31/2020, 2:36 AM

no. i mean, record the timing for the process once the lock has been in acquired. ie, with the lock held

hundreds-breakfast-49010

03/31/2020, 2:37 AM

what's "the lock" that you mean?

witty-crayon-22786

03/31/2020, 2:37 AM

the async semaphore, yes.

hundreds-breakfast-49010

03/31/2020, 2:42 AM

would that look something like:

Copy code

///
  /// Runs the given Future-creating function (and the Future it returns) under the semaphore.
  ///
  pub async fn with_acquired<F, B, O>(self, f: F) -> O
  where
    F: FnOnce() -> B + Send + 'static,
    B: Future<Output = O> + Send + 'static,
  {
    let permit = self.acquire().await;
    let start_time = std::time::SystemTime::now();
    let res = f().await;
    drop(permit);
    let end_time = std::time::SystemTime::now();
    let duration = end_time.since(start_time);
    res
  }

witty-crayon-22786

03/31/2020, 2:50 AM

That would maybe be a useful place to add a workunit to capture "time inside of a semaphore", in general.

witty-crayon-22786

03/31/2020, 2:51 AM

As to where to capture for processes in particular, I'd refer again to our conversation on the 17th

hundreds-breakfast-49010

03/31/2020, 2:51 AM

that makes sense. but I mean, is recording start and end times in between the

.await

calls like that meaningful?

witty-crayon-22786

03/31/2020, 2:51 AM

Yes.

witty-crayon-22786

03/31/2020, 2:52 AM

The task/stack suspends. When it suspends, it does so until some real thing has finished happening, and then it resumes.

hundreds-breakfast-49010

03/31/2020, 2:55 AM

and we can think of that suspension as happening at the beginning of an

.await

expression, right? so for instance in that block I proposed above, the execution thread would record the current start time, then wait on

let res = f().await;

until that had taken some amount of real time to complete, and then we'd drop the lock immediately and

let end_time = std::time::SystemTime::now();

would be a new timestamp sometime meaningfully in the future from

start_time

witty-crayon-22786

03/31/2020, 2:56 AM

Yes.

Open in Slack

Previous Next