<@UKPS08ZGW>: A general thought on workunits: when...
# development
w
@hundreds-breakfast-49010: A general thought on workunits: when in doubt, add more.
In particular, adding more workunits inside of process execution would get you more detail about which phase of the process you are in.
Creating the sandbox, running, capturing outputs, etc.
h
@witty-crayon-22786 adding workunits in more parts of the codebase is not a bad idea. but since workunits ultimately generate a user-facing report, we need some sort of way to make arbitrary workunits configurable by users
this is what named `@rule`s do, but we don't have that configurability for the other places in the pants codebase where we create a workunit
w
sure. but process execution is an intrinsic. and a particularly important one
so, let's say: you currently have a workunit for the process execution Node. that is outside the process execution lock.
if you had another inside the lock, you could differentiate "time to run a process, accounting for time waited due to contention" from "time running the process", and you could subtract those to know how long you waited to start
h
and we do have this, since the presence of a
user_facing_name
is what controls whether a owrkunit is created, and all
MultiPlatofrmExecuteProcessRequests
have one
w
if you then add more workunits below the lock, you would be tracking the time for each phase
@hundreds-breakfast-49010: understood. my point is that there is information beyond just "which Node is running" here. especially with processes, there is more fine-grained, useful information available.
👍 1
(or... could be with a few more workunits added)
i don't know exactly how that maps to the CLI ui, but i think that it does.
h
workunits don't actually map directly to the CLI UI currently
workunits control what zipkin JSON gets generated. toolchain cares about this because we have a plugin that does some post-processing on that JSON
w
so: i think that if you get the workunits right, and there is enough information there, one path forward for the CLI UI might be to walk a graph of workunits, rather than the capital-G Graph of Nodes
h
the timings displayed in the v2 UI are separate
but yes I do think it makes sense for those to be the same system
we should be calculating the timing of nodes in one canonical place, and using those timings everwhere it's relevant to display that
that's part of what my PR from a few weeks ago was meant to do, and I'll probably return to that in the near future
w
the point above though is less about "where the timing is captured", and more about "where are timings stored"
if the CLI UI was rendering from the WorkunitStore (for example) rather than the Graph, then it would render the more fine-grained process information automatically, without any additional Nodes being created
h
sure. but those are actually being stored in two places too
ah, yeah, I think that makes sense
ultimately I don't think the
EntryState::Running { start_time, ... }
field should exist. everything we do with that currently should be done with a
WorkUnit
w
i think that i agree.
because we had already discussed not rendering it except for Processes anyway.
h
right. I think it might also make sense to render
Snapshots
, since disk IO is something that I think I"ve noticed taking a noticably long time
w
yea.
h
I also wonder if it makes sense to always create
WorkUnit
data structures and put them into the store, and to do the filtering of what workunits to pass to python at a different level
w
unclear. possibly.
h
right now there's code in
<http://nodes.rs|nodes.rs>
that will either create a workunit or not depending on the value of
user_facing_name
for a node
w
yep.
h
maybe intstead that should set a
render: bool
field on
WorkUnit
w
well, that's good i think....?
i think it's good that not all Nodes end up rendered. some just won't be useful.
a variant on this though
h
the only reason to care is if we want to have different rules about what we display in the v2 UI and what we pass to zipkin/the toolchain plugin
but I think we probalby do want that
w
maybe. unclear. i could imagine unification there... see what i was saying about "adding more workunits" and wanting them rendered.
so, another tack for a second
h
sure
w
if we continue to walk the Graph of Nodes to render the CLI ui:
it's possible that
user_facing_name
could get a little bit more rich and become "current rendered line", and could 1) change names over time 2) include runtime
the implementation for a process would be to (for the same Node), change its name based on which phase it was in.
... that is maybe challenging.
... since it doesn't answer the storage question.
h
I don't think the
WorkUnit
or the
Node
is the right place for that information to canonically live. that's a v2 UI concern
user_facing_name
is part of what you need to build up the v2 UI line, but not all of it
w
well, that's why i was suggesting that it could include runtime. but yea.
h
toolchain wants
user_facing_name
, at least for rules, to be a static, canonical name for a python
@rule
a string that includes the runtime should be something other than
user_facing_name
(maybe we should rename the property of a node that is
user_facing_name
to something else, like
canonical_name
?)
w
i also suggested a new name for the function, heh 😃
but anyway: that tangent is probably less interesting than having the CLI UI consume workunits.
h
yeah
w
because "nodes are always the most useful granularity to render" doesn't really play out.
h
although for me right now, the v2 UI is a secondary priority to making improvements to the workunit json
w
more fine-grained information about a process makes total sense, i think.
h
I guess I should say the "zipkin json"
I've been doing some testing on my own machine in the last day or so, and it actually seems like workunit associated with the
MultiPlatformExecuteProcessRequest
is capturing more or less the amount of real time that EPRs are taking
certainly as you say adding more granularity couldn't hurt
that workunit is outside the async lock, and we could add them inside the lock
er, add additional workunits inside the lock
w
yep
there is some overhead to the sandbox itself, and i don't think that we know what it is.
but yea, good to hear.
h
I want to keep my eye on the big picture. the two high-level problems I need to solve right now are 1) make sure that the JSON workunit/zipkin output that the toolchain uses is useful and actionable and 2) make sure that the timestamps in the v2 console UI are displaying useful times and not capturing time when some process is waiting to run
it's definitely possible that there's sandbox overhead that didn't show up when I was testing yesterday with my sepcific workflows, but that is relevant in other workflows
w
afaict, this thread touches both of those. a workunit inside the lock would improve (1). and then additionally switching the CLI to consuming the workunit graph would fix (2)
h
yeah I think it would
I'm also doing other work pertaining to workunits right now, to fix specific problems toolchain has, and I want to make sure that any redesigns to where workunits get created fits in there
btw, yesterday @fast-nail-55400 informed me about this crate: https://docs.rs/tracing/0.1.13/tracing/index.html#using-the-macros
which has some functionality for explicitly tracing and profiling `Future`s
w
yes.
probably worth using at some point.
h
with the idea that we might be able to use this replace the workunit store data structure
I spent a couple hours playing with it yesterday and found it hard to get a proof of concept working
I'm not sure what I think about that yet, I haven't spent enough time trying to get the library to work so far, but I was curious if you've thought about whether
tracing
annotations on various
Future
-returning functions could do the same thing as the
workunit_store
code?
w
yea. it doesn't have any particular secret sauce for recording futures, afaik. it's mostly just good that it is standardized.
tracing annotations would ... cause more workunits to be created, yes.
h
mostly what I'm concerned about is not creating more workunits or more workunit-like data structures without having some way to filter them that current clients of workunits (i.e. zipkin tracing and the toolchain plugin) can live with
w
yea, that would be one concern with the
tracing
crate, i think.
h
it's also a concern with adding more workunits under the lock
w
since it would be standardized, you'd have spans/workunits being created in a bunch more places
so you'd need to filter them.
@hundreds-breakfast-49010: well, it's under your control at least. if it's too noisy, can remove it again, heh
h
but yeah if we need to build an additional workunit filtering layer anyway, then standadly-creating tracing structures is less of a problem
yeah we can write the toolchain plugin however we want 🙂
but I want to make sure that any changes I introduce to mainline pants are clean and sensible and don't cause other problems down the line
w
sorry, i meant that if recording the workunit in pants doesn't look useful, can stop doing it. i wasn't suggesting consumer-side filtering.
(maybe you need that anyway, but)
h
I think that any client of workunits, including toolchain's, will and can be expected to do some of its own filtering
w
k