Thought dumping before the long weekend, sorry if ...
# development
Thought dumping before the long weekend, sorry if it isn't fully formed. I thought of a possible performance improvement that sits between linting the world and linting process-per-file. Process per dependency set. E.g. if 10 files have depset A, and 12 files had depset B, then you'd lint in 2 processes. I'm wondering what the performance characteristics would be in a real-world scenario when compared to the current 2 options. (I think this is different from the batching proposal as well, but in the same vain)
Where depset == transitive dependencies? Kind of related, we plan to update MyPy and Pylint to partition lint runs by "resolve", where a resolve == lockfile more or less
Yeah, the partition would be the largest set of first-party files that share a set of transitive deps. I'm assuming if all deps were already locked, the resolve partitioning is moot?
I still thinking synthesizing lint results per-file out of a monolithic run is the ideal state, but also really challenging to get right
I'm assuming if all deps were already locked, the resolve partitioning is moot?
The key is multiple conflicting resolves. You have a lockfile that uses Django==2 and another that uses Django==3, for example That's not very well supported in Pants right now, but it's an important use case for monorepos. My main focus next few weeks is implementing this
👀 1
It sounds like there will some built-in partitioning along with some opt-in as well. I think it'd be neat to see the result of the matrix of choices
I think what we actually will want is "linting the unlinted world" - lint only the files that we don't already know pass lint, but do so in a single process. This will require some way of caching the results as if they had been run per-file.
I agree, but why "but do so in a single process"? Why not farm out as makes sense?
True, "in some number of processes 1<=x<=N"
👆 1