I’ve done a bit of testing with both tools, and I’d like to explain the pros and cons of both approaches and why I think Black is a better option.
So the main difference I was aware of from reading both projects descriptions was that yapf would allow us to configure it so that we could minimise the changes from our de-facto style. Most importantly, we use 2 spaces for indentation and I would be quite happy if I didn’t have to convince developers to change their style for such a potentially contentious detail.
So my mindset was: everything else being equal, let’s make yapf work and pick it up.
So I went ahead and tried both tools on the pants code base. Here is what I found:
• yapf fails to format our code when running it on the entire pants codebase. I throws an exception and doesn’t indicate which file was troublesome to parse.
• When black fails to parse a file, it flags it up and keeps going. If you run it again, you can see the list of files that it failed parsing.
• When running both on the same directory (contrib, after hunting for a directory where yapf wouldn’t throw an exception in the middle of its execution), I get these timings:
• Black: 3.5 seconds
• yapf: 12 seconds
• If I rerun on the same directory, Black does some caching, so I get
• Black: 0.4 seconds
• yapf: 12 seconds
One of the reasons Black performs better is that it leverages all my machine’s cores where yapf doesn’t bother to.
In all the files in pants, exactly one gives an issue to Black: unicode/compilation_failure/main.py; because Black trips up on its invalid syntax. This can easily be added to an ignore list in Black’s config file.
Based on these observations, I am pretty convinced that Black is a more robust tool and will give us less grief using it, while giving us the benefits of consistent and automatic formatting.
Now for the elephant in the room: Black’s opinions. Personally, I am quite happy to delegate stylistic decisions to a body of developers (the Black developers) who thought of it very hard and came up with something they would consider “sensible”. I truly believe automation and consistency both trump any sensitivity I could have about the specifics of how to layout code. It is pretty easy to get used to a certain style if it is used consistently throughout a codebase. Specifically, this means I would be fine with switching to 4 spaces since Black doesn’t allow 2. I think the code produced by Black is quite readable, after browsing through pants post-formatting. Black does allow the user to configure the line length, and I would advocate for either 88 (which is their default) or 100 (which seems to be our de-facto standard). I would probably lean a tiny bit towards 100 lines because I think that may be what a majority of pants devs prefer, although I’m completely fine either way.
I have two questions I would like everyone who cares to answer:
• Would you be OK to switch to Black? If not, why and what would be your preferred course of action? (yapf, nothing, …?)
• If yes, would you be OK with 100 for lines length? If not, why and what length would you prefer?
Looking forward to hearing your thoughts 😄