Does Pants have any built in APIs for handling sensitive dat Pants #plugins

Does Pants have any built-in APIs for handling "se...

gorgeous-winter-99296

11/20/2022, 9:42 PM

Does Pants have any built-in APIs for handling "sensitive" data? I'm looking at a plugin for a KMS tool, but those values should never end up in logs, in caches, etc.

curved-television-6568

11/20/2022, 11:25 PM

Oh, great topic. Alas, no there’s no such features in Pants (yet?) 🙂

gorgeous-winter-99296

11/21/2022, 10:25 AM

I see! I'll see what I can do with the EngineAware types and overriding

__repr__

and

__str__

to start with.

🙏 1

💯 4

hundreds-father-404

11/21/2022, 4:23 PM

Yeah, that's the only way they'd show up in logs via Pants, other than you manually adding

<http://logger.info|logger.info>("sensitive data")

lines

gorgeous-winter-99296

11/21/2022, 4:25 PM

Yeah. I'm a bit worried about what happens with the

Process

on the rust side as well though - the rust implementation of it is both

Debug

and

Serialize

. https://github.com/pantsbuild/pants/blob/main/src/rust/engine/process_execution/src/lib.rs#L479-L481

hundreds-father-404

11/21/2022, 4:28 PM

ohhh you're right, we will dump the

argv

. but not env vars

hundreds-father-404

11/21/2022, 4:29 PM

*well, env var names. not env var values. exceppttt when you use

--keep-sandboxes

gorgeous-winter-99296

11/21/2022, 4:31 PM

Argv I'll have to solve. I think the sandboxes are OK from my POV; just like I'm planning to support explicitly dumping a decrypted key. I just want to do due diligence and prevent "accidental" leaks in logs or caches (e.g let's say we run KMS on CI; with a cache action 😱)

👍 1

hundreds-father-404

11/21/2022, 4:41 PM

argv is only when you use

-ldebug

, fwit. but that's not very safe because that's sometimes used in CI when debugging. We ask people to use it a lot to help us help them debug

gorgeous-winter-99296

11/21/2022, 4:48 PM

Hmm. I'll have to have a think. We could do some custom handling/interpolation-ish? A

secrets

arg for Process and an interpolation-like syntax?

${{ secrets.API_TOKEN }}

maybe.

hundreds-father-404

11/21/2022, 4:53 PM

is it not possible to read the secret via an env var or file? A lot of systems don't allow passing raw secrets via the CLI because it's inherently insecure, that other processes can easily read the argv

gorgeous-winter-99296

11/21/2022, 5:00 PM

Sure. That's the proper way of doing it. So if I add support for a

password

input from a

Secret

for the

python_distribution

publish

action; we should definitely use the

TWINE_PASSWORD

env variable to run it. But... if someone else implements a backend for BadlyDesignedTool they shouldn't leak their credentials if we can prevent it.

gorgeous-winter-99296

11/21/2022, 5:18 PM

E.g. if we consider all correct usages of a KMS right now one shouldn't ever put keys on an argv. But allowing that to happen, and letting the keys end up in a logfile - when preventable - is bad. Another approach would be to decrypt the keys inside the sandbox in such a way that they never are on an explicit command line.

hundreds-father-404

11/21/2022, 5:19 PM

if someone else implements a backend for BadlyDesignedTool they shouldn't leak their credentials if we can prevent it.

Yeah, but that's theoretical at this point, right? So far it hasn't been an issue because all tools have been designed well We try to avoid "premature generalization" -- when this situation happens in the future, if ever, we can tweak Pants to handle this concern. For example, it has been super helpful to dump argv in

-ldebug

. It would be complicated if we now only sometimes did that. That code complexity has a real cost, e.g. harder to understand code and more likely we have bugs

gorgeous-winter-99296

11/21/2022, 5:23 PM

Gotcha. Sure. But let's flip the consideration: Let's say we put them in env because that's safe and not logged today. Two weeks from now someone else has a white-space issue with an environment variable and makes a PR to dump the env too on

-ldebug

. Who's to blame for the leaks?

👍 1

gorgeous-winter-99296

11/24/2022, 10:15 PM

Ok; a few days of hacking and I've got something that works here: https://github.com/tgolsson/pants-backends/pull/18. A bit unsure of how useful this would be to upstream - I've created a

pants_ext

direcory with code I think would could be reused as pants core code. See my

python_distribution_with_secret

target which IMO would be an improvement over today. That's not what drove this from the start; but it was something where I could fix what is a workflow problem for me. However, merging something like this also opens up unsafety that I'm not sure I like where it'd require more time from regular Pants contributors to fix holes - argv and env we've mentioned above; but there's other situations - what if I put a credential in a file? Can I ensure that file doesn't get cached? Is a blinking warning sign on all docs related to secrets the highest reward-per-effort we can get there?

gorgeous-winter-99296

11/24/2022, 10:20 PM

What I think I'm trying to convince myself of: maybe doing nothing special to handle secrets apart from plumbing and some stricter defaults is better than pretending to do it right and failing. Because one is clear about "do the audits yourself".

5 Views

Open in Slack

Previous Next