Is there a way to tell pants that a `python_test` ...
# general
p
Is there a way to tell pants that a
python_test
target needs the git metadata? I don't see any fields that obviously enable that... Background: The repo that has
pants.toml
includes a git submodule specifically for this test, as the code under test basically runs
git worktree
on that git submodule to make it accessible from a tmp directory. The test runs several permutations of that using various tags in that submodule's git repo. So, now I need a way to make git metadata available in the pants-built pytest sandbox.
1
b
I had a memory of https://www.pantsbuild.org/dev/reference/targets/python_test#run_goal_use_sandbox ... but the docs says "No other goals (such as
test
, if applicable) consult this field." so nevermind. Wild idea: have the test work with the submodule directly, something like: 1. pull the whole submodule git directory into the sandbox with
files()
or similar (and appropriate de-ignoring of its
.git
etc.) 2. invoke
git
commands within the submodule directly Dno if that's at all possible or sensble
e
I had a slightly similar situation: On top of having a bunch of integration tests in
src/test/integration
, that I could run locally (after bringing up a docker compose stack), I also wanted to be able to create a pex containing everything in
src/test/integration
recursively, so I could copy it into a docker image and run the tests not just locally, but also inside our application docker image to verify environment setup and inter container networking kinds of stuff. Overall, it was quite a bit of a pain, but I made it work by having a "glob-everything-target" like
python_sources(sources="***/**.py")
that could be used as a dependency for the pex. I had to do a bunch of explicit dependency ignoring in the "glob-everything-target" and in the regular, per-directory target generators so that they wouldn't get in each other's way and see multiple targets owning the same files. (Not exactly a solution, but maybe my anecdote helps you along) 🤞
c
From poking at https://github.com/pantsbuild/pants/issues/19981 I don't think there is a clean way to get git info pushed down today.
👀 1
p
Well, I can't use
resources(sources=[".git/modules/**"])
since
.git
is ignored. Adding
!.git/modules/
to
pants_ignrore.add
didn't work either.
I just tried something really wonky: https://gist.github.com/cognifloyd/ff9a43fc54971233384acff50c262088 I used
experimental_workspace_environment
+
system_binary(binary_name="true")
+
adhoc_tool(output_directories=[".git/modules"]
to try and get the
.git/modules
directory as a digest that can be materialized in a sandbox. But that didn't work. I can capture
.gitmodules
, but I can't capture anything in
.git/
. So, it seems that output collection or digest creation filters the files based on pants_ignore 😞
The
.git/
directory is not present in any of the
.gitignore
files or in
pants_ignore
(as verified by
pants help PANTS_IGNORE
). Even
git check-ignore .git/
says that .git itself is not ignored. So, there must be something other than
pants_ignore
that is excluding
.git
from output digests.
I found a way (2 ways actually) to get
.git/modules
copied into a sandbox! The key insight is that
experimental_workspace_environment
STILL HAS a temporary sandbox directory, even though it is running in the workspace. The output files/dirs are captured from that sandbox directory, not from the workspace. So, both methods use
experimental_workspace_environment(name="in_repo")
and then: 1.
system_binary(binary_name="cp")
+
adhoc_tool(environment="in_repo", runnable=":cp", args=["-r", ".git/modules", "{chroot}/.git"], output_directories=[".git/modules"])
2.
shell_command(environment="in_repo", command="cp -r .git/modules {chroot}/.git", output_directories=[".git/modules"])
. I updated my gist, so you can see both methods in the revisions with
shell_command
in the latest revision. https://gist.github.com/cognifloyd/ff9a43fc54971233384acff50c262088/revisions
run_shell_command
would not work, because it cannot capture output. But workspace_environment + shell_command seems fairly clean. 🙂 One thing that makes using
cp
on
.git/modules
is that the submodule is extremely small at around 200K for the .git/modules directory. For a large repo, it might not be wise to copy the entire
.git
directory without serious performance penalties.
c
I am not sure to be happy or terrified at that solution! Glad you got something working.
p
My next thing to try was going to be a lot worse... A plugin that uses
open()
to grab all the files and put them in a digest since I thought they were getting filtered out somewhere else... So, I'm happy with how succinct this solution is 😛 Terrifying and cringy, sure, but its purpose is hopefully clear within the BUILD files.
One gotcha with my solution is caching won't be as clean as I thought. I thought
.gitmodules
stored the commit sha of the submodules, but no. That's a file at the location of the git module that gets replaced by a directory with the cloned submodule repo. Sad, but oh well.