Having an interesting issue when importing `datacl...
# general
w
Having an interesting issue when importing
dataclasses
. Using
2.8.0.dev0
(trying to use the GCP stuff!) and current python version is 3.6.10. I have a requirements file with this specified:
Copy code
dataclasses==0.8; python_version < '3.7'
When I import as
import dataclasses
the dependency is not included. But when I import as
from dataclasses import asdict
, I'm able to pull in the dependency. Is this a bug, or some quirk that I just missed in the docs?
h
I think it's probably due to our std lib exclusion list. We don't try to infer deps on std lib dependencies. For some reason, dataclasses.asdict isn't excluded but dataclasses is In the past, someone proposed fixing by determining the current python version and using the std list based on that. But that's tricky to do, as code may work with multiple interpreters I think the solution is much simpler...stop using this exclude list. Iirc, it was only for performance to not try categorizing deps that come from std lib. But that performance should be neglible. The cost of dep inference is computing the module mappings and AST parsing. After that, it's simply a dict lookup. Very fast. This was a bad optimization imo As you've found, it's also incorrect. Breaks backports like this Would you be interested in fixing? It would be deleting a couple lines, updating some unit tests, and running a benchmark
w
Yes, I'd be interested. I think it's sensible not to have special cases here. I'll start taking a look this afternoon.
h
Exactly. I'm surprised with those benchmark numbers, that's a really long time and dependency inference isn't that slow anymore - that was pants 1.30. Seeing that benchmark prompted my curiosity enough to run a benchmark now, will report back in a minute
That's not expected. Before:
Copy code
❯ hyperfine --warmup=1 --runs=100 './pants --no-remote-cache-{read,write} --no-pantsd dependencies ::'
Benchmark #1: ./pants --no-remote-cache-{read,write} --no-pantsd dependencies ::
  Time (mean ± σ):      3.901 s ±  0.583 s    [User: 3.332 s, System: 0.975 s]
  Range (min … max):    3.620 s …  6.622 s    100 runs
After:
Copy code
❯ hyperfine --warmup=1 --runs=100 './pants --no-remote-cache-{read,write} --no-pantsd dependencies ::'
Benchmark #1: ./pants --no-remote-cache-{read,write} --no-pantsd dependencies ::
  Time (mean ± σ):      3.801 s ±  0.590 s    [User: 3.335 s, System: 0.978 s]
  Range (min … max):    3.613 s …  8.574 s    100 runs
But Hyperfine warns about statistical flukes, such as from noisy machine (I closed all apps at least). I'd be surprised if this actually makes things faster, but I suppose it's possible because we no longer have to do
set(detected_imports) - set(stdlib)
. More likely is probably statistical noise So I think the takeaway is this is safe to change performance wise 🙂 PR very much appreciated to fix this! And if you're able to get to it today, we still have time to cherry-pick into 2.7 before the stable release
w
Ok, the changes seem simple enough that I'll try to get an initial PR over lunch. That way if I'm off base, you can steer me back before this afternoon.
❤️ 1
Added you to this PR: https://github.com/pantsbuild/pants/pull/12818 CI checks are still in progress as I write this.