Oof. I found the issue for why tests are taking so...
# development
h
Oof. I found the issue for why tests are taking so long now. It is NOT safe to use
FrozenSet
. Even though it's immutable, it's not deterministic so messes up caching. https://github.com/pantsbuild/pants/pull/8577 changed
PexRequirements
to use
FrozenSet
instead of
Tuple
, which means that resolve requirements are almost never cached now
a
Good find!
๐Ÿ‘ 2
e
oh wow
where is the indeterminism? In the way it stores set members?
h
Yeah I'm curious about this myself
and how exactly did you figure this out?
is it relatively easy to construct an example where the nondeterminism occurs?
h
Sets are always non-deterministic in Python, whereas Lists, Tuples, and Dicts are deterministic. I found this out by realizing how slow CI is taking for V2 and then running
./pants --no-v1 --v2 test tests/python/pants_test/util:osutil
locally. It takes 60 seconds to resolve the requirements for that. On master, it will take the same amount of time. (I used simple
print()
and
time.time()
to measure this). With the PR I'm about to put up, the second time is less than a second
One thing I'm curious about is the CFFI boundary. How do we store sets in Rust?
e
probably just Vecs
That might get converted to sets later
Is that how sets get converted from Rust->Python or from Python->Rust?
e
Pretty sure that returns a pointer to a python object with the values of the rust set.
So rust would call that function to create a python set, and then do something with the handle to the python object
h
That is... bonkers
What do you mean by "deterministic" here?
a
How the FFI happens depends on whether itโ€™s going to be consumed from python or rust
If rust code is going to directly inspect the contents, the code @hundreds-father-404 linked to is used
h
even if the underlying representation changes, surely the values of hash and eq are consistent?
a
If the thing is only going to be consumed from other python `@rule`s, what Henry described will happen
h
What do you mean by "deterministic" here?
How the data is stored. There's no guarantee that the ordering will be the same, even if
__eq__
and
__hash__
work This is how dictionaries worked with Python until 3.6
h
I mean, I want to say "of course there isn't". It's a set, not a list. But why does order matter?
If we're relying on iteration order on a set of any kind then that's a fundamental error.
It should be safe to cache on
frozenset
as a value, just not on an iteration over its members.
h
It should be safe to cache on
frozenset
as a value, just not on an iteration over its members.
I agree. I'm trying to find how to fix this and am not sure where we go from
Python->Rust
. Changing
native.py
didn't do anything
h
Basically, we should figure out why we're relying on order in a
frozenset
, because that is a deep category error...
h
Basically, we should figure out why we're relying on order in a
frozenset
, because that is a deep category error...
We probably just haven't special cased sets yet. Order does matter for other data structures:
(1, "hello") != ("hello", 1)
for tuples.
h
Yes, that is by definition
My point is that all these data structures are well-defined and consistent if used as prescribed
๐Ÿ‘ 1
a tuple is order-preserving as part of its definition
a set is not