I'm working on a goal to run security auditing too...
# general
d
I'm working on a goal to run security auditing tools. Since the result will always be dependent on the state of the outside world (the currently reported vulnerabilities) I need to make sure we never cache the result. Is there a preferred way to mark a goal/result as uncachable?
w
Could that be handled at the Process scope? Don't cache the results? https://www.pantsbuild.org/docs/rules-api-process#process
f
How and where is the database of security vulnerabilities accessed?
And is that database under control of the auditing tool exclusively or are there control knobs for the Pants (or another invoker) to use?
d
I'm planning to use it first for running the pyaudit tool
f
For example, if the tool can report the version of the security database, then using an uncached Process, get that version number. Then store the version number as a "dummy" env var on the Process that actually runs the audit.
Then you would have caching per version of the audit rules.
Since a different version number would lead to a different Process due to changing env var
d
I'm looking now to see where it's fetching data from and also whether I can get any further info from it.
f
and if you are willing to cache the "version lookup" per session than you will get another bit of performance win
👍 1
d
Also: I mean pip-audit, not py-audit
c
usually there’s a process invocation involved, and it’s easy enough to control the caching for those (as @wide-midnight-78598 was poking at as well) See: https://github.com/pantsbuild/pants/blob/d87f9b6810209b87238635101000cb7db512d835/src/python/pants/engine/process.py#L32-L45 for those options if that turns out to be a fit..
👍 1
d
Unfortunately the tool doesn't give me a handle on the current version of the db. Since the data is being fetched from https://github.com/pypa/advisory-database/ I could theoretically make an additional call and check the etag of https://github.com/pypa/advisory-database/tree/main/vulns to determine whether I need to run again or not.
I might file an issue with pip-audit to see if they can give me a better solution though.
Thanks for the help!
b
In the meantime, having it work uncached is better than not having anything, I think 😛
h
There is precedent for this, A goal_rule is always uncacheable, you can mark any other rule as
@_uncacheable_rule
, and you can mark processes as uncacheable (I forget the details).
👍 1
c
there’s also the possibility to have a rule’s return type inherit from
EngineAwareReturnType
and from there you can flag it as uncachable. https://github.com/pantsbuild/pants/blob/5580f808ceea83b15bbc85498f0a55b78362772a/src/python/pants/engine/engine_aware.py#L66-L72
👍 1