Does Pants have any built-in APIs for handling "se...
# plugins
g
Does Pants have any built-in APIs for handling "sensitive" data? I'm looking at a plugin for a KMS tool, but those values should never end up in logs, in caches, etc.
c
Oh, great topic. Alas, no thereโ€™s no such features in Pants (yet?) ๐Ÿ™‚
g
I see! I'll see what I can do with the EngineAware types and overriding
__repr__
and
__str__
to start with.
๐Ÿ™ 1
๐Ÿ’ฏ 4
h
Yeah, that's the only way they'd show up in logs via Pants, other than you manually adding
<http://logger.info|logger.info>("sensitive data")
lines
g
Yeah. I'm a bit worried about what happens with the
Process
on the rust side as well though - the rust implementation of it is both
Debug
and
Serialize
. https://github.com/pantsbuild/pants/blob/main/src/rust/engine/process_execution/src/lib.rs#L479-L481
h
ohhh you're right, we will dump the
argv
. but not env vars
*well, env var names. not env var values. exceppttt when you use
--keep-sandboxes
g
Argv I'll have to solve. I think the sandboxes are OK from my POV; just like I'm planning to support explicitly dumping a decrypted key. I just want to do due diligence and prevent "accidental" leaks in logs or caches (e.g let's say we run KMS on CI; with a cache action ๐Ÿ˜ฑ)
๐Ÿ‘ 1
h
argv is only when you use
-ldebug
, fwit. but that's not very safe because that's sometimes used in CI when debugging. We ask people to use it a lot to help us help them debug
g
Hmm. I'll have to have a think. We could do some custom handling/interpolation-ish? A
secrets
arg for Process and an interpolation-like syntax?
${{ secrets.API_TOKEN }}
maybe.
h
is it not possible to read the secret via an env var or file? A lot of systems don't allow passing raw secrets via the CLI because it's inherently insecure, that other processes can easily read the argv
g
Sure. That's the proper way of doing it. So if I add support for a
password
input from a
Secret
for the
python_distribution
publish
action; we should definitely use the
TWINE_PASSWORD
env variable to run it. But... if someone else implements a backend for BadlyDesignedTool they shouldn't leak their credentials if we can prevent it.
E.g. if we consider all correct usages of a KMS right now one shouldn't ever put keys on an argv. But allowing that to happen, and letting the keys end up in a logfile - when preventable - is bad. Another approach would be to decrypt the keys inside the sandbox in such a way that they never are on an explicit command line.
h
if someone else implements a backend for BadlyDesignedTool they shouldn't leak their credentials if we can prevent it.
Yeah, but that's theoretical at this point, right? So far it hasn't been an issue because all tools have been designed well We try to avoid "premature generalization" -- when this situation happens in the future, if ever, we can tweak Pants to handle this concern. For example, it has been super helpful to dump argv in
-ldebug
. It would be complicated if we now only sometimes did that. That code complexity has a real cost, e.g. harder to understand code and more likely we have bugs
g
Gotcha. Sure. But let's flip the consideration: Let's say we put them in env because that's safe and not logged today. Two weeks from now someone else has a white-space issue with an environment variable and makes a PR to dump the env too on
-ldebug
. Who's to blame for the leaks?
๐Ÿ‘ 1
Ok; a few days of hacking and I've got something that works here: https://github.com/tgolsson/pants-backends/pull/18. A bit unsure of how useful this would be to upstream - I've created a
pants_ext
direcory with code I think would could be reused as pants core code. See my
python_distribution_with_secret
target which IMO would be an improvement over today. That's not what drove this from the start; but it was something where I could fix what is a workflow problem for me. However, merging something like this also opens up unsafety that I'm not sure I like where it'd require more time from regular Pants contributors to fix holes - argv and env we've mentioned above; but there's other situations - what if I put a credential in a file? Can I ensure that file doesn't get cached? Is a blinking warning sign on all docs related to secrets the highest reward-per-effort we can get there?
What I think I'm trying to convince myself of: maybe doing nothing special to handle secrets apart from plumbing and some stricter defaults is better than pretending to do it right and failing. Because one is clear about "do the audits yourself".