https://pantsbuild.org/ logo
#plugins
Title
# plugins
c

cold-mechanic-10814

03/23/2023, 6:05 PM
Hi - I have a question relating to a
@goal_rule
that I'm implementing. The goal needs to (through one way or another) invoke the
git
command to create a git tag. Is that possible, with pants' hermetic workspace? I've read somewhere the docs that you can get a read-only subset of the current git info, but what about calling git commands that will actually modify the state?
e

enough-analyst-54434

03/23/2023, 6:15 PM
w

witty-crayon-22786

03/23/2023, 6:16 PM
that would be one way… but particularly for actually mutating things, i’d recommend doing it in an
InteractiveProcess
directly in your goal, at the top of the stack
c

cold-mechanic-10814

03/23/2023, 6:21 PM
Thanks for your replies. I was looking at
InteractiveProcess
, thinking that might be the recommended path, but I was wondering if it's possible to use a 3rd party lib such as
GitPython
as it provides some useful abstractions - or is that against the rules of @rules since it modifies things in a what that pants can't monitor?
w

witty-crayon-22786

03/23/2023, 6:22 PM
It does violate the sandbox, yea. To do it safely, you'd have to mark the @rule uncacheable, or ensure that you were only doing it directly from your @goal_rule function.
c

cold-mechanic-10814

03/23/2023, 6:23 PM
Ok, so the @goal_rule is implicitly marked as uncacheable?
w

witty-crayon-22786

03/23/2023, 6:24 PM
Correct
c

cold-mechanic-10814

03/23/2023, 6:25 PM
Are there any limitations as to what you can do in the @goal_rule? Or can you pretty much do anything (i.e. no idempotency restrictions, side effects, network calls, etc.)?
w

witty-crayon-22786

03/23/2023, 6:28 PM
well. it’s probably closer to call it “unspecified behavior”. for example, https://github.com/pantsbuild/pants/issues/10542 hopes to eventually run goals concurrently, until they reach a critical section marked by builtins like
InteractiveProcess
,
Workspace.write_digest
,
Console.print
, etc.
if you’re creating sideeffects outside of those APIs, you will likely need to fix your code eventually.
c

cold-mechanic-10814

03/23/2023, 6:33 PM
Ah, so I even if I can use a lib such as
GitPython
, it's probably going to break in a future release?
w

witty-crayon-22786

03/23/2023, 6:34 PM
probably for #10542 there will be an escape hatch to say “i’m about to do something with sideeffects”. but it will be an update to your code, whereas the existing APIs won’t need that
c

cold-mechanic-10814

03/23/2023, 6:35 PM
Ok, makes sense
e

enough-analyst-54434

03/23/2023, 6:35 PM
I think it's generally best to pretend rule Python is not Python. Just for manipulation of data structures, loops, conditionals.
c

cold-mechanic-10814

03/23/2023, 6:40 PM
Sorry, sent too soon.. rewriting
Going down the InteractiveProcess route, just so I understand, would I be able to do something like this? Or have I misunderstood?
Copy code
class MyGitHelper:
    def __init__(self, repo_path):
        self.repo_path

    async def some_complex_multistep_op():
        ...

        res1 = await Effect(InteractiveProcessResult, InteractiveProcess(argv=["git something"]),)

        ... 

        res2 = await Effect(InteractiveProcessResult, InteractiveProcess(argv=["git something else"]),)

        ...

        return some_retval


@goal_rule
def my_goal(targets: Targets) -> MyGoal:
    helper = MyGitHelper(os.cwd())

    await helper.some_complex_multistep_op()
w

witty-crayon-22786

03/23/2023, 6:45 PM
yea, that should work.
after the first
InteractiveProcess
has started, your code will be in a critical section, and not interruptible restartable
(…which is what you want)
c

cold-mechanic-10814

03/23/2023, 6:48 PM
Ok, great. I'll play around with that approach - I was hoping to avoid having to call
git
directly, being able to use existing libs etc. but this is definitely manageable as we only need a couple of specific ops
Thank you both!
Sorry, one last question - what if I want to get the
stdout
of
InteractiveProcess
? Say I want to list the git tags via
git tags
- it seems that if I do it via
Process
it will fail since the
.git
directory is not available, but if I do it via
InteractiveProcess
I can't retrieve the
stdout
w

witty-crayon-22786

03/23/2023, 6:59 PM
…oh, darn. yea, that’s an issue.
you’d have to pass in the git dir to
Process
, and then additionally mark the process uncacheable with
ProcessCacheScope
.
…ok, sorry. might be a good idea to look at what John recommended, or using a library directly from your
@goal_rule
c

cold-mechanic-10814

03/23/2023, 7:03 PM
If I were to ensure that an
InteractiveProcess
or a
Console.print
statement were executed before I invoke any side effects, that would "future-proof" the goal against the upcoming changes? A bit of a hack, but reasonable
w

witty-crayon-22786

03/23/2023, 7:04 PM
yea
c

cold-mechanic-10814

03/23/2023, 7:04 PM
So I could
Copy code
@goal_rule
def my_goal(console: Console) -> MyGoal:
    console.print('something')
    some_op_with_side_effects_from_a_3rd_party_lib()
w

witty-crayon-22786

03/23/2023, 7:05 PM
yea.
referring to that ticket.
e

enough-analyst-54434

03/23/2023, 7:06 PM
FWIW you can use a Process just fine mod caching concerns: just use GIT_* env vars to point to where .git is.
c

cold-mechanic-10814

03/23/2023, 7:07 PM
So long as I first copy the git dir into the workspace?
e

enough-analyst-54434

03/23/2023, 7:07 PM
No
It;s not widely known but you can run git in the wrong dir.
You just need to tell it where the db lives.
c

cold-mechanic-10814

03/23/2023, 7:07 PM
Ah, sorry, I thought Process couldn't read anything outside of the workspace
e

enough-analyst-54434

03/23/2023, 7:07 PM
The code I initially pointed to does that.
Pants sandboxing is fake. We don't jail the fs.
We just place you in a hard to reason about tmpdir
You can still see the whole filesystem.
c

cold-mechanic-10814

03/23/2023, 7:08 PM
Ok, right, so with an absolute path to the git dir I can work around it
e

enough-analyst-54434

03/23/2023, 7:09 PM
As Stu has discussed with you though, and to re-iterate, caching is the key thing to get right in all this. You're in delicate waters.
c

cold-mechanic-10814

03/23/2023, 7:10 PM
Any other things to bear in mind other than setting the ProcessCacheScope?
w

witty-crayon-22786

03/23/2023, 7:11 PM
Re: the sandboxing being "fake" though: that's true until you try and use a
docker_environment
, at which point you would be trapped by
Process
.
e

enough-analyst-54434

03/23/2023, 7:12 PM
Well ok. B*ut that's not this.*
w

witty-crayon-22786

03/23/2023, 7:12 PM
Imo, do the GitPython thing, marked clearly.
We need to clean up the APIs that John linked to, so they wouldn't be my first choice.
e

enough-analyst-54434

03/23/2023, 7:14 PM
What favors adding a GitPython dep over using Process out of curiousity?
c

cold-mechanic-10814

03/23/2023, 7:17 PM
It's not a done-deal yet - on one side I could just use GitPython to avoid re-implementing some of the useful abstractions they provide, but then be a bit "out of bounds" from a Pants rule perspective, or use the more low-level approach of using bare git commands via Process. I'm going to run through the exact list of ops we need to perform on the git repo to quantify the effort of the Process approach, or whether we just use the lib
e

enough-analyst-54434

03/23/2023, 7:18 PM
What do you mean by useful abstractions in this context?
git tag x
seems to need little abstraction!
c

cold-mechanic-10814

03/23/2023, 7:19 PM
Of course 🙂 That was just an example
e

enough-analyst-54434

03/23/2023, 7:19 PM
So you have other uses?
c

cold-mechanic-10814

03/23/2023, 7:21 PM
Yes - we're currently porting our polyrepos to a monorepo, using pants as the main tooling for the build and release process. That includes version bumping based on git commits, with different artifacts within the repo being independently versioned. It's a bit semantic-release meets lerna, but for python. It ultimately involves quite a bit of git wrangling when you try to figure out exactly which asset needs to be version bumped due to transitive dependencies.
e

enough-analyst-54434

03/23/2023, 7:22 PM
Pants determines all that git aside for you
It hashes all inputs transitively for artifacts, etc.
c

cold-mechanic-10814

03/23/2023, 7:22 PM
Right, that's one of the attractive things about Pants
But we still need to inspect the git history to determine what type of version bump
But only the subset of commits that are relevant to assets in the transitive dependencies
e

enough-analyst-54434

03/23/2023, 7:23 PM
So you use keywords or something in commits that hint at semver?
c

cold-mechanic-10814

03/23/2023, 7:24 PM
Exactly, like conventional commits
As I said, we may be able to reduce the number of git commands thanks to pants' dep-tracking, in which case the Process approach might be better
e

enough-analyst-54434

03/23/2023, 7:25 PM
I'm not sure what that means, but I think I get it a bit more now. You need
git tag x
and, roughly,
git log -- $(pants filedeps)
.
I'll still be shocked if 180K of re-implementation of git saves you much over a few git command lines.
c

cold-mechanic-10814

03/23/2023, 7:26 PM
Sorry, conventional commits is a commit message convention like
fix: bug 123
or
feat: richer reports
You're probably right
e

enough-analyst-54434

03/23/2023, 7:27 PM
Ok. Gotcha. I had not heard of conventional commits TM.
c

cold-mechanic-10814

03/23/2023, 7:28 PM
It's more of a javascript thing - it came out of the Angular project. Not sure if it has much traction in the python community.
e

enough-analyst-54434

03/23/2023, 7:28 PM
I'm a luddite either way and generally not up to date on technology.
Well, what you're doing is a bit of a dream of mine. It would be super cool if it was done without commit messages but with source analysis for API breaks, etc.
Guava in Java land almost has this automated.
c

cold-mechanic-10814

03/23/2023, 7:29 PM
Haha, that would be pretty impressive - maybe for a V2
e

enough-analyst-54434

03/23/2023, 7:30 PM
It a big hole in the software universe. Humans get semver wrong alot.
c

cold-mechanic-10814

03/23/2023, 7:30 PM
Although if you have good enough static typing, you might be able to do something close..
e

enough-analyst-54434

03/23/2023, 7:30 PM
Absolutely.
c

cold-mechanic-10814

03/23/2023, 7:31 PM
You'd need some kind of lightweight contract testing, with snapshots of the type sigs of the public APIs
Not completely far-fetched
e

enough-analyst-54434

03/23/2023, 7:33 PM
Yeah, the early days of Toolchain involved indexing the symbols of all the libraries in PyPI and maven central to form a database of this sort of information for use on the consume side. The idea being you don't actually care about the version you consume, you care about the subset of the API you consume.
c

cold-mechanic-10814

03/23/2023, 7:35 PM
That makes sense, but seems ambitious!
Anyway, thanks again for your help! I definitely understand pants a little bit more.