With the engine we ve now accumulated a couple pair types li Pants #development

With the engine, we’ve now accumulated a couple pa...

hundreds-father-404

03/21/2020, 3:57 PM

With the engine, we’ve now accumulated a couple pair types like

AddressWithOrigin

and

TargetWithOrigin

. We’re using dataclasses to represent this. I’d like to instead use

typing.NamedTuple

, which has the exact same

class

definition as a dataclass, including allowing us to define methods, but is more performant and allows us to unpack the pair:

Copy code

target, origin = target_with_origin

But, the one downside is that

TargetWithOrigin(tgt, origin) == (tgt, origin) == DistinctNamedTuple(tgt, origin)

. (

TargetWithOrigin != tuple != DistinctNamedTuple

, at least). I don’t think this is actually an issue? We don’t care if

TargetWithOrigin(tgt, origin) == (tgt, origin)

occurs and because the engine calls

type()

on instances to use exact type IDs, we would fail if a rule author forgot to wrap the tuple in its corresponding NamedTuple?

aloof-angle-91616

03/21/2020, 3:59 PM

i would really prefer if the class name was required to destructure, like rust tuple structs, and we may want to make that a convention

aloof-angle-91616

03/21/2020, 4:00 PM

otherwise i don’t see a problem and ive had a TODO in my head to allow destructuring via tuples for a while

👖 1

hundreds-father-404

03/21/2020, 4:00 PM

i would really prefer if the class name was required to destructure,

I don’t think we generally want that for complex types: see https://www.python.org/dev/peps/pep-0557/#why-not-just-use-namedtuple for why this makes ~~inheritance~~ adding new fields tricky But for pair types like

XWithY

, we should never be adding a 3rd field so it would be super nice to have.

aloof-angle-91616

03/21/2020, 4:00 PM

so this would be an awesome change

🔥 1

aloof-angle-91616

03/21/2020, 4:01 PM

Instances are always iterable, which can make it difficult to add fields.

i don’t like this part

aloof-angle-91616

03/21/2020, 4:02 PM

oh!!!

aloof-angle-91616

03/21/2020, 4:02 PM

that’s for namedtuple()

👍 1

hundreds-father-404

03/21/2020, 4:04 PM

Yes, the PEP is explaining how allowing you to unpack something like

NamedTuple

means it’s much more difficult to add a new field to the class than before because you break all call sites that now need to unwrap that new field

hundreds-father-404

03/21/2020, 4:05 PM

I think dataclasses are working well for most engine types, e.g.

Target

and

StripTargetRequest

. One nice feature is having custom constructors that can be frozen after (thanks again for that decorator!). Where we can optimize is the pair types we have like

AddressWithorigin

aloof-angle-91616

03/21/2020, 4:07 PM

i don’t know how i feel about that. changing every call site feels like something that’s very hard to automate and that i’d prefer not to do. if it was possible to destructure while not knowing exactly how the type is laid out (like with rust’s

..

), then this seems like it’d be a lot more useful

hundreds-father-404

03/21/2020, 4:09 PM

i don’t know how i feel about that.

Feel about what? About using

NamedTuple

with pair types like

AddressWithOrigin

? They can keep using named properties like

address_with_origin.address

if they want! No need to rewrite those call sites. Only, now, there’s an additional technique they can use to destructure via tuple unpacking

aloof-angle-91616

03/21/2020, 4:10 PM

ok!

witty-crayon-22786

03/21/2020, 10:55 PM

haven't read the rest of the thread, but: those pair classes could almost certainly be avoided with the "multiple Params to a Get" change

witty-crayon-22786

03/21/2020, 10:55 PM

if so... would be good to do that rather than continue to create or optimize them

aloof-angle-91616

03/21/2020, 10:56 PM

oh ugh i have a branch for that

aloof-angle-91616

03/21/2020, 10:56 PM

not even close to working yet

witty-crayon-22786

03/21/2020, 10:56 PM

if this is a pressing need, i can take a look at it today/tomorrow rather than doing async/await stuff

hundreds-father-404

03/21/2020, 11:00 PM

Not a pressing need at all, imo. Multiple params is solely boilerplate reduction, which is valuable but I think Rust async/await would be more impactful. I’m still pretty unclear too how multiple params would work with collection types. It’s a common idiom for a goal rule to request

TargetsWithOrigins

and

AddressesWithOrigins

. What would that look like with multiple params?

witty-crayon-22786

03/21/2020, 11:07 PM

oh... unclear.

hundreds-father-404

03/21/2020, 11:16 PM

That’s part of the reason I don’t see a strong benefit to being able have the equivalent of

AddressWithOrigin

via multiple params. `NamedTuple`s are exceptionally lightweight (Py 3.9 made an optimization where their attribute access is the fastest in Python, apparently). It’s also a very nice construct for programmers. They can easily wrap their head around having “both an Address and an OriginSpec” - it’s a plain and simple Python idiom with no new engine magic. Another consideration. For

AddressesWithOrigins

and

TargetsWithOrigins

, we must preserve the method

.expect_single()

witty-crayon-22786

03/21/2020, 11:22 PM

both of those are computed from some other structure though, yea?

witty-crayon-22786

03/21/2020, 11:22 PM

a dict?

witty-crayon-22786

03/21/2020, 11:23 PM

not sure here... but collection classes like that generally evolve into something purpose-specific. for example: "Addresses" currently implicitly means "all Addresses for the root Specs"

witty-crayon-22786

03/21/2020, 11:23 PM

likewise each of those other collections.

witty-crayon-22786

03/21/2020, 11:24 PM

so i don't know how they'll evolve over time, since it's mostly just goal_rules that should be requesting "the world" like that.

witty-crayon-22786

03/21/2020, 11:25 PM

and there is always

Addresses

Targets

, and

Origins

hundreds-father-404

03/21/2020, 11:25 PM

No, see https://github.com/pantsbuild/pants/blob/eeb9d7488a39bb428bf19e273584675d1041745b/src/python/pants/engine/legacy/graph.py#L836-L890 and https://github.com/pantsbuild/pants/blob/eeb9d7488a39bb428bf19e273584675d1041745b/src/python/pants/engine/build_files.py#L213-L267. We go out of our way to preserve the

OriginSpec

at the time of first parsing the

Addresses

. We must preserve that information when we first encounter it or we will have no way of backtracking “What command line spec did

:lib

come from..?”

AddressesWithOrigins

is the root of it all. From there, we can get

Addresses

by stripping off the origins, for example; or convert it to

TargetsWithOrigins

witty-crayon-22786

03/21/2020, 11:25 PM

where origins was a dict , and you could look up the origin for an Address or Target via address

hundreds-father-404

03/21/2020, 11:27 PM

where origins was a dict , and you could look up the origin for an Address or Target via address

We probably could do that..but I don’t see the benefit? There is a tight coupling between an

Address

and its

OriginSpec

. Every

Address

has precisely one

OriginSpec

. Right now, we preserve that information when we first resolve the addresses. Throwing it away and replacing it with this

Origins

singleton could work, but seems round-about

witty-crayon-22786

03/21/2020, 11:27 PM

it involves fewer classes that are just compositions of other classes

witty-crayon-22786

03/21/2020, 11:28 PM

it eliminates

TargetsWithOrigins

, in particular.

hundreds-father-404

03/21/2020, 11:28 PM

it involves fewer classes that are just compositions of other classes

At the expense of more magical singletons and rules needing to request both

Targets

and

Origins

, rather than simply

TargetsWithOrigins

witty-crayon-22786

03/21/2020, 11:28 PM

fewer, right?

Addresses

already exists.

hundreds-father-404

03/21/2020, 11:29 PM

Huh? Yes, it does and will always exist. But you’re proposing adding a new

Origins

type

witty-crayon-22786

03/21/2020, 11:29 PM

the net number of classes is fewer then, i think. because it would add Origins, but remove both AddressesWithOrigins and TargetsWithOrigins

witty-crayon-22786

03/21/2020, 11:30 PM

anyway, not sure.

witty-crayon-22786

03/21/2020, 11:31 PM

it's the difference between a function taking two parameters and one merged parameter

witty-crayon-22786

03/21/2020, 11:31 PM

except that here, in the one merged parameter case you need a wrapper class

hundreds-father-404

03/21/2020, 11:32 PM

Indeed, 1 fewer class. But now, every call site must not only have 2 parameters now, but also when iterating over every target, must have boilerplate to reassociate that target with its OriginSpec vs. that association already being done for you

in the one merged parameter case you need a wrapper class

Keep in mind that if you don’t care about

WithOrigins

, you can still request the simpler

Addresses

. You only request

AddressesWithOrigins

when you have something meaningful to do with the origin specs.

witty-crayon-22786

03/21/2020, 11:33 PM

i don't know what the use sites look like... is it the case that you need the origin in all cases? or only in error cases?

hundreds-father-404

03/21/2020, 11:34 PM

You need it for

lint

fmt

, and

test

so that we can calculate the precise files to run on, e.g.

./v2 test foo.py

. No other goals use it and simply request `Addresses`/`Targets`

witty-crayon-22786

03/21/2020, 11:34 PM

hundreds-father-404

03/21/2020, 11:35 PM

or only in error cases?

We never use the

OriginSpec

for errors, at the moment. This feature is solely used for precise file arguments, which is something rule authors have to go out of their way to opt into

witty-crayon-22786

03/21/2020, 11:36 PM

to head back to the rest of the thread:

witty-crayon-22786

03/21/2020, 11:36 PM

regarding

dataclass

datatype

used to extend

namedtuple

👍 1

witty-crayon-22786

03/21/2020, 11:36 PM

my understanding when we moved to dataclass was that it was similarly performant to namedtuple

aloof-angle-91616

03/21/2020, 11:37 PM

yes, that was mine too

witty-crayon-22786

03/21/2020, 11:37 PM

is that not the case? i don't think we noticed a performance difference, but it might have gotten lost in the python3 speedup.

hundreds-father-404

03/21/2020, 11:37 PM

my understanding when we moved to dataclass was that it was similarly performant to namedtuple

Yes, dataclass still has very good and optimized performance. But NamedTuple is even more lightweight because it’s doing less things than a fully fledged class

witty-crayon-22786

03/21/2020, 11:38 PM

is there a reference you've seen on that?

hundreds-father-404

03/21/2020, 11:39 PM

For example, you might have seen how you can use

__slots__

to reduce a class’s memory footprint. Dataclasses don’t do that by default (they considered it) because it makes things much more complex. In contrast, because NamedTuple is just a wrapper around a tuple, it (I believe) uses that performance hack

witty-crayon-22786

03/21/2020, 11:40 PM

looks like dataclass is supposed to be faster: https://medium.com/@jacktator/dataclass-vs-namedtuple-vs-object-for-performance-optimization-in-python-691e234253b9

👍 1

😮 1

witty-crayon-22786

03/21/2020, 11:40 PM

and a bit smaller

witty-crayon-22786

03/21/2020, 11:40 PM

...which is surprising.

witty-crayon-22786

03/21/2020, 11:40 PM

see the

Comparison

table

hundreds-father-404

03/21/2020, 11:41 PM

That is..Although, it looks like they don’t test

frozen=True

, which is known to slightly reduce performance https://docs.python.org/3/library/dataclasses.html#frozen-instances

hundreds-father-404

03/21/2020, 11:42 PM

I mean, fwit, performance isn’t the real reason I want

NamedTuple

for

TargetWithOrigin

and

AddressWithOrigin

. I suspected it would be better so threw it out there as an additional carrot What I really want is tuple unpacking:

Copy code

tgt, origin = target_with_origin

witty-crayon-22786

03/24/2020, 1:46 AM

so, related to this, it looks like implementing

Addresses

via

AddressesWithOrigins

causes about a 10-15% slowdown for

list

1.26.x

❗ 1

witty-crayon-22786

03/24/2020, 1:47 AM

which ends up noticeable in CI times

witty-crayon-22786

03/24/2020, 1:49 AM

i'll take a swing at computing them lazily tomorrow

👍 1

hundreds-father-404

03/24/2020, 2:13 AM

Wow. This is using address specs, correct?

hundreds-father-404

03/24/2020, 2:14 AM

Why 1.26.x? We made that change in 1.25.x iirc

witty-crayon-22786

03/24/2020, 2:27 AM

1.25.x.svg,1.26.x.svg

1.26.x.svg 1.25.x.svg

witty-crayon-22786

03/24/2020, 2:27 AM

doesn't look like it

hundreds-father-404

03/24/2020, 2:28 AM

Ohh you’re right. 1.25.x added file args, 1.26.x added precise file args

Open in Slack

Previous Next