Q on how to handle scripts that run extra file validation W Pants #development

Q on how to handle scripts that run extra file val...

bitter-ability-32190

12/09/2021, 4:29 PM

Q on how to handle scripts that run extra file validation: We have some scripts that we use to generate/validate certain files in-repo (TSVs and JSON files. I know, I know, we shouldn't check these in, but whatever). Ideally the scripts don't need to run every "pre-submit" invocation as they change infrequently. I see three options, fishing for thoughts... 1. Define a custom fmt/lint plugin ◦ Plugins for fmt/lint seem to be weighted towards language-based (E.g. using file extension for finding the sources) 2. Edit the scripts to allow for checking the files as well, and define pytest tests to validate the contents on disk are expected ◦ A bit of a hack, admittedly. But still gets us into the world of cached results 3. Do something custom where we ask git for changed files, and only run the scripts if the relevant files have changed I have a preference for 2, as although it's a hack, there isn't any custom build tooling involved. But want to get other's thoughts.

enough-analyst-54434

12/09/2021, 4:35 PM

In option #2 are you burning trees needlessly after the 1st time the test passes and before the next time the file changes (is regenerated)?

🔥 1

🌲 1

enough-analyst-54434

12/09/2021, 4:36 PM

If so, the generation code should self-contain the test. Then run once and check in with confidence. Independent of build tool.

bitter-ability-32190

12/09/2021, 4:40 PM

For option #2 I would assume Pants' caching and

--changed-since

comes into play. Not sure I completely understand what you're proposing 🤔

enough-analyst-54434

12/09/2021, 4:43 PM

If the files are machine generated, do the machine checking once when you generate them.

bitter-ability-32190

12/09/2021, 4:44 PM

Sure, but they evolve over time as they are generated from some inputs also in the repo. As the inputs change, we want to ensure the output file is "correct".

enough-analyst-54434

12/09/2021, 4:45 PM

Right, but if they are generated (that's the key) then the code that runs the generation can run the test too. That all in one pass works independent of build tool.

enough-analyst-54434

12/09/2021, 4:46 PM

Your #2 separated the steps.

bitter-ability-32190

12/09/2021, 4:49 PM

I'm not sure I see where the validation would come into play then (for CI). Perhaps what you're suggesting is option 3 (something living outside of the build tool)? Ideally we don't have to instrument anything for CI that sits outside of the build tool. As an example of one such script, it takes one of these JSON files and sorts the keys so that diffing it is feasible.

enough-analyst-54434

12/09/2021, 4:52 PM

Ok. My suggestion was only applicable if you always wanted to generate a file then test the file was generated correctly. Only then could you combine generation code and checking code in the same script and run the script - however - once.

enough-analyst-54434

12/09/2021, 5:03 PM

So, yeah, that's #3. But #3 is just

./pants --changed-since=foo my_custom_goal.

No real need to integrate with fmt or lint necessarily.

enough-analyst-54434

12/09/2021, 5:09 PM

Maybe similar? https://github.com/pantsbuild/pants/blob/b5ef7b244cd6347aaf52101b516d5c0a26593a4f/build-support/githooks/pre-commit#L30-L33 We have a script to both generate and check GitHub actions workflows. That pre-commit check only runs when needed and effectively prompts you to re-run generation & check in only when needed.

👀 1

bitter-ability-32190

12/09/2021, 5:19 PM

Yeah that's what I had in mind for #3

Open in Slack

Previous Next