I'm thinking about how to make pants' notion of Wo...
# development
h
I'm thinking about how to make pants' notion of Workunits more sophisticated/have more information on them, and I want to make sure that I understand how to correctly measure the time it takes pants engine operations to run
especially since some of this code has changed recently with @witty-crayon-22786ā€™s work on upgrading
tokio
and using
async
syntax
my high-level goal here is that toolchain wants to be sure that the information contained in the JSON representation of a workunit, particularly timestamps that purport to show how long it takes a workunit to run, are useful/actionable
I'm not actually sure if we're computing the duration of a workunit correctly right now
the way we currently do this is, in
NodeKey::run
, we grab
std::time::SystemTime::now()
and make a
StartedWorkunit
from that
then we move that data structure into the
async move
block
and then, after this block: https://github.com/pantsbuild/pants/blob/master/src/rust/engine/src/nodes.rs#L1084 , which has a bunch of
.await
calls, we inspect the
StartedWorkunit
again, and transform it into a
Workunit
we put into the store
which involves calling this
finish
method: https://github.com/pantsbuild/pants/blob/master/src/rust/engine/workunit_store/src/lib.rs#L54, which computes
TimeSpan::since(start_time)
so, based on my understanding of rust futures, I'm not sure why this works. but it does seem to work
that is, it seems like the code we invoke after the
.await
methods have been called in the
async move
block is happening some amount of real time after the code earlier in
run
where we originally computed
start_time
, so the
time_span
is measuring something that takes a human-noticeable amount of time to run
it's not like these are microseconds apart
another question I have - let's say that I wanted to accurately measure start and end times for time spent executing an external process
I think that one way I could do that is, analogously to how the
NodeKey::run
code computes a start time outside of an
async
block, then computes an end time within that block, but after an
.await
call, and then subtracts them
I could rewrite some of the futures-using code in
process_execution
(maybe
process_execution/local.rs::run_and_capture_workdir
?) to use async/await syntax, and insert time-recording and time-difference-recording amidst that syntax in those files, and get a useful time difference?
@average-vr-56795 @aloof-angle-91616 if you have any thoughts about how best to approach this problem I'd be interested in hearing them as well
w
async syntax is equivalent to combinators in terms of how it is executed.
so the introduction of async/await should not have changed the behavior of that code.
i don't really have a great idea of how to explain it. but along with the introduction of async/await, the rust folks published a book: https://rust-lang.github.io/async-book/01_getting_started/01_chapter.html
a
i was just reading this right now from hacker news: https://os.phil-opp.com/async-await/
h
@aloof-angle-91616 yeah I saw that article myself, I was meaning to read it
@witty-crayon-22786 maybe a better question is, are we sure that the code that currently computes the duration of a workunit is "doing the right thing"?
in the sense of, measuring the amount of time that pants is spending doing useful work on the workunit the code puts the timestamp on?
I believe that before it was rewritten to use the
.await
syntax the code that computed the end time was in an
.inspect
future that was placed after that main future that delegated to each variant of `NodeKey`'s
run
method
w
the async/await code is equivalent to that.
https://pantsbuild.slack.com/archives/C0D7TNJHL/p1585621222057200?thread_ts=1585618751.053000&cid=C0D7TNJHL it has always been the case that this is challenging for async code, because every time a future suspends, it is "doing nothing" until it is awoken again. this is similar to "blocking" in blocking code.
do you blame a task for the time that it spends waiting for other things? yes, probably. but it's more challenging when you have thousands of async things happening at once.
we've had that particular discussion before. and it lead to talking about rendering only processes in the CLI ui
in zipkin-like workunit spans, it's basically the right thing.
šŸ‘ 1
h
@witty-crayon-22786 actually part of my confusion here is that, even if we assume that we only want to render the time it takes for a process to run in the CLI UI, I'm not sure what I would do differently from the timing code associated with the workunit, to get that number
we use an async library for executing an external process, rather than the std lib
Command
data structure
w
do it inside the lock. we talked about this in DMs
h
by "inside the lock", do you mean modifying
AsyncSempahore::with_acquired
?
w
no. i mean, record the timing for the process once the lock has been in acquired. ie, with the lock held
h
what's "the lock" that you mean?
w
the async semaphore, yes.
h
would that look something like:
Copy code
///
  /// Runs the given Future-creating function (and the Future it returns) under the semaphore.
  ///
  pub async fn with_acquired<F, B, O>(self, f: F) -> O
  where
    F: FnOnce() -> B + Send + 'static,
    B: Future<Output = O> + Send + 'static,
  {
    let permit = self.acquire().await;
    let start_time = std::time::SystemTime::now();
    let res = f().await;
    drop(permit);
    let end_time = std::time::SystemTime::now();
    let duration = end_time.since(start_time);
    res
  }
?
w
That would maybe be a useful place to add a workunit to capture "time inside of a semaphore", in general.
As to where to capture for processes in particular, I'd refer again to our conversation on the 17th
h
that makes sense. but I mean, is recording start and end times in between the
.await
calls like that meaningful?
w
Yes.
The task/stack suspends. When it suspends, it does so until some real thing has finished happening, and then it resumes.
h
and we can think of that suspension as happening at the beginning of an
.await
expression, right? so for instance in that block I proposed above, the execution thread would record the current start time, then wait on
let res = f().await;
until that had taken some amount of real time to complete, and then we'd drop the lock immediately and
let end_time = std::time::SystemTime::now();
would be a new timestamp sometime meaningfully in the future from
start_time
?
w
Yes.