Thought dumping before the long weekend, sorry if ...
# development
b
Thought dumping before the long weekend, sorry if it isn't fully formed. I thought of a possible performance improvement that sits between linting the world and linting process-per-file. Process per dependency set. E.g. if 10 files have depset A, and 12 files had depset B, then you'd lint in 2 processes. I'm wondering what the performance characteristics would be in a real-world scenario when compared to the current 2 options. (I think this is different from the batching proposal as well, but in the same vain)
h
Where depset == transitive dependencies? Kind of related, we plan to update MyPy and Pylint to partition lint runs by "resolve", where a resolve == lockfile more or less
b
Yeah, the partition would be the largest set of first-party files that share a set of transitive deps. I'm assuming if all deps were already locked, the resolve partitioning is moot?
I still thinking synthesizing lint results per-file out of a monolithic run is the ideal state, but also really challenging to get right
h
I'm assuming if all deps were already locked, the resolve partitioning is moot?
The key is multiple conflicting resolves. You have a lockfile that uses Django==2 and another that uses Django==3, for example That's not very well supported in Pants right now, but it's an important use case for monorepos. My main focus next few weeks is implementing this
👀 1
b
It sounds like there will some built-in partitioning along with some opt-in as well. I think it'd be neat to see the result of the matrix of choices
h
I think what we actually will want is "linting the unlinted world" - lint only the files that we don't already know pass lint, but do so in a single process. This will require some way of caching the results as if they had been run per-file.
b
I agree, but why "but do so in a single process"? Why not farm out as makes sense?
h
True, "in some number of processes 1<=x<=N"
👆 1