Q on how to handle scripts that run extra file val...
# development
b
Q on how to handle scripts that run extra file validation: We have some scripts that we use to generate/validate certain files in-repo (TSVs and JSON files. I know, I know, we shouldn't check these in, but whatever). Ideally the scripts don't need to run every "pre-submit" invocation as they change infrequently. I see three options, fishing for thoughts... 1. Define a custom fmt/lint plugin ◦ Plugins for fmt/lint seem to be weighted towards language-based (E.g. using file extension for finding the sources) 2. Edit the scripts to allow for checking the files as well, and define pytest tests to validate the contents on disk are expected ◦ A bit of a hack, admittedly. But still gets us into the world of cached results 3. Do something custom where we ask git for changed files, and only run the scripts if the relevant files have changed I have a preference for 2, as although it's a hack, there isn't any custom build tooling involved. But want to get other's thoughts.
e
In option #2 are you burning trees needlessly after the 1st time the test passes and before the next time the file changes (is regenerated)?
🔥 1
🌲 1
If so, the generation code should self-contain the test. Then run once and check in with confidence. Independent of build tool.
b
For option #2 I would assume Pants' caching and
--changed-since
comes into play. Not sure I completely understand what you're proposing 🤔
e
If the files are machine generated, do the machine checking once when you generate them.
b
Sure, but they evolve over time as they are generated from some inputs also in the repo. As the inputs change, we want to ensure the output file is "correct".
e
Right, but if they are generated (that's the key) then the code that runs the generation can run the test too. That all in one pass works independent of build tool.
Your #2 separated the steps.
b
I'm not sure I see where the validation would come into play then (for CI). Perhaps what you're suggesting is option 3 (something living outside of the build tool)? Ideally we don't have to instrument anything for CI that sits outside of the build tool. As an example of one such script, it takes one of these JSON files and sorts the keys so that diffing it is feasible.
e
Ok. My suggestion was only applicable if you always wanted to generate a file then test the file was generated correctly. Only then could you combine generation code and checking code in the same script and run the script - however - once.
So, yeah, that's #3. But #3 is just
./pants --changed-since=foo my_custom_goal.
No real need to integrate with fmt or lint necessarily.
Maybe similar? https://github.com/pantsbuild/pants/blob/b5ef7b244cd6347aaf52101b516d5c0a26593a4f/build-support/githooks/pre-commit#L30-L33 We have a script to both generate and check GitHub actions workflows. That pre-commit check only runs when needed and effectively prompts you to re-run generation & check in only when needed.
👀 1
b
Yeah that's what I had in mind for #3