Hi - I have a question relating to a `@goal_rule` ...
# plugins
c
Hi - I have a question relating to a
@goal_rule
that I'm implementing. The goal needs to (through one way or another) invoke the
git
command to create a git tag. Is that possible, with pants' hermetic workspace? I've read somewhere the docs that you can get a read-only subset of the current git info, but what about calling git commands that will actually modify the state?
e
w
that would be one way… but particularly for actually mutating things, i’d recommend doing it in an
InteractiveProcess
directly in your goal, at the top of the stack
c
Thanks for your replies. I was looking at
InteractiveProcess
, thinking that might be the recommended path, but I was wondering if it's possible to use a 3rd party lib such as
GitPython
as it provides some useful abstractions - or is that against the rules of @rules since it modifies things in a what that pants can't monitor?
w
It does violate the sandbox, yea. To do it safely, you'd have to mark the @rule uncacheable, or ensure that you were only doing it directly from your @goal_rule function.
c
Ok, so the @goal_rule is implicitly marked as uncacheable?
w
Correct
c
Are there any limitations as to what you can do in the @goal_rule? Or can you pretty much do anything (i.e. no idempotency restrictions, side effects, network calls, etc.)?
w
well. it’s probably closer to call it “unspecified behavior”. for example, https://github.com/pantsbuild/pants/issues/10542 hopes to eventually run goals concurrently, until they reach a critical section marked by builtins like
InteractiveProcess
,
Workspace.write_digest
,
Console.print
, etc.
if you’re creating sideeffects outside of those APIs, you will likely need to fix your code eventually.
c
Ah, so I even if I can use a lib such as
GitPython
, it's probably going to break in a future release?
w
probably for #10542 there will be an escape hatch to say “i’m about to do something with sideeffects”. but it will be an update to your code, whereas the existing APIs won’t need that
c
Ok, makes sense
e
I think it's generally best to pretend rule Python is not Python. Just for manipulation of data structures, loops, conditionals.
c
Sorry, sent too soon.. rewriting
Going down the InteractiveProcess route, just so I understand, would I be able to do something like this? Or have I misunderstood?
Copy code
class MyGitHelper:
    def __init__(self, repo_path):
        self.repo_path

    async def some_complex_multistep_op():
        ...

        res1 = await Effect(InteractiveProcessResult, InteractiveProcess(argv=["git something"]),)

        ... 

        res2 = await Effect(InteractiveProcessResult, InteractiveProcess(argv=["git something else"]),)

        ...

        return some_retval


@goal_rule
def my_goal(targets: Targets) -> MyGoal:
    helper = MyGitHelper(os.cwd())

    await helper.some_complex_multistep_op()
w
yea, that should work.
after the first
InteractiveProcess
has started, your code will be in a critical section, and not interruptible restartable
(…which is what you want)
c
Ok, great. I'll play around with that approach - I was hoping to avoid having to call
git
directly, being able to use existing libs etc. but this is definitely manageable as we only need a couple of specific ops
Thank you both!
Sorry, one last question - what if I want to get the
stdout
of
InteractiveProcess
? Say I want to list the git tags via
git tags
- it seems that if I do it via
Process
it will fail since the
.git
directory is not available, but if I do it via
InteractiveProcess
I can't retrieve the
stdout
w
…oh, darn. yea, that’s an issue.
you’d have to pass in the git dir to
Process
, and then additionally mark the process uncacheable with
ProcessCacheScope
.
…ok, sorry. might be a good idea to look at what John recommended, or using a library directly from your
@goal_rule
c
If I were to ensure that an
InteractiveProcess
or a
Console.print
statement were executed before I invoke any side effects, that would "future-proof" the goal against the upcoming changes? A bit of a hack, but reasonable
w
yea
c
So I could
Copy code
@goal_rule
def my_goal(console: Console) -> MyGoal:
    console.print('something')
    some_op_with_side_effects_from_a_3rd_party_lib()
w
yea.
referring to that ticket.
e
FWIW you can use a Process just fine mod caching concerns: just use GIT_* env vars to point to where .git is.
c
So long as I first copy the git dir into the workspace?
e
No
It;s not widely known but you can run git in the wrong dir.
You just need to tell it where the db lives.
c
Ah, sorry, I thought Process couldn't read anything outside of the workspace
e
The code I initially pointed to does that.
Pants sandboxing is fake. We don't jail the fs.
We just place you in a hard to reason about tmpdir
You can still see the whole filesystem.
c
Ok, right, so with an absolute path to the git dir I can work around it
e
As Stu has discussed with you though, and to re-iterate, caching is the key thing to get right in all this. You're in delicate waters.
c
Any other things to bear in mind other than setting the ProcessCacheScope?
w
Re: the sandboxing being "fake" though: that's true until you try and use a
docker_environment
, at which point you would be trapped by
Process
.
e
Well ok. B*ut that's not this.*
w
Imo, do the GitPython thing, marked clearly.
We need to clean up the APIs that John linked to, so they wouldn't be my first choice.
e
What favors adding a GitPython dep over using Process out of curiousity?
c
It's not a done-deal yet - on one side I could just use GitPython to avoid re-implementing some of the useful abstractions they provide, but then be a bit "out of bounds" from a Pants rule perspective, or use the more low-level approach of using bare git commands via Process. I'm going to run through the exact list of ops we need to perform on the git repo to quantify the effort of the Process approach, or whether we just use the lib
e
What do you mean by useful abstractions in this context?
git tag x
seems to need little abstraction!
c
Of course 🙂 That was just an example
e
So you have other uses?
c
Yes - we're currently porting our polyrepos to a monorepo, using pants as the main tooling for the build and release process. That includes version bumping based on git commits, with different artifacts within the repo being independently versioned. It's a bit semantic-release meets lerna, but for python. It ultimately involves quite a bit of git wrangling when you try to figure out exactly which asset needs to be version bumped due to transitive dependencies.
e
Pants determines all that git aside for you
It hashes all inputs transitively for artifacts, etc.
c
Right, that's one of the attractive things about Pants
But we still need to inspect the git history to determine what type of version bump
But only the subset of commits that are relevant to assets in the transitive dependencies
e
So you use keywords or something in commits that hint at semver?
c
Exactly, like conventional commits
As I said, we may be able to reduce the number of git commands thanks to pants' dep-tracking, in which case the Process approach might be better
e
I'm not sure what that means, but I think I get it a bit more now. You need
git tag x
and, roughly,
git log -- $(pants filedeps)
.
I'll still be shocked if 180K of re-implementation of git saves you much over a few git command lines.
c
Sorry, conventional commits is a commit message convention like
fix: bug 123
or
feat: richer reports
You're probably right
e
Ok. Gotcha. I had not heard of conventional commits TM.
c
It's more of a javascript thing - it came out of the Angular project. Not sure if it has much traction in the python community.
e
I'm a luddite either way and generally not up to date on technology.
Well, what you're doing is a bit of a dream of mine. It would be super cool if it was done without commit messages but with source analysis for API breaks, etc.
Guava in Java land almost has this automated.
c
Haha, that would be pretty impressive - maybe for a V2
e
It a big hole in the software universe. Humans get semver wrong alot.
c
Although if you have good enough static typing, you might be able to do something close..
e
Absolutely.
c
You'd need some kind of lightweight contract testing, with snapshots of the type sigs of the public APIs
Not completely far-fetched
e
Yeah, the early days of Toolchain involved indexing the symbols of all the libraries in PyPI and maven central to form a database of this sort of information for use on the consume side. The idea being you don't actually care about the version you consume, you care about the subset of the API you consume.
c
That makes sense, but seems ambitious!
Anyway, thanks again for your help! I definitely understand pants a little bit more.