does anyone have an example of a larger python rep...
# general
f
does anyone have an example of a larger python repo using pants?
b
What constitutes “larger” to you / what kinds of examples are you looking for?
f
the docs have two .py files. So maybe 20+ would be sufficient. I'm trying to understand how other have structurted the tests. In all the examples i've seen tests in the same folder as the code they are testing but our repo is setup with tests in a different folder at top level. This means we put the conftest.py in that separate folder, If i move the tests into the same folders as the code, then i'm less sure where to put the conftest.py so it's picked up by pytest.
also for general purpose seeing how they might solve the issues i've been having (like my earlier question about how to use dependencies from above the current folder)
b
Ah, gotcha. In our repo we structure it so that the tests are located in a separate folder called
tests
underneath the package dir. We put the
conftest.py
in the
tests
directory as well
I think this is the same as your setup now - if you put the tests in the same directory as the source files then presumably you’d want to keep
conftest.py
in that same directory as well
f
just to make sure i understand you. I think you are doing this:
Copy code
models
- tests
 L conftest.py
 L some_test.py
-some_code.py
and in my repo we're currently doing:
Copy code
models
- tests
 L conftest.py
 L some_test.py

- code
 L some_code.py
so similar to you. However, when you try to use pants to test only the code that has changed, you need to have specific dependencies from the test to the specific code files being tests. i.e. some_test.py -> some_code.py. That way, when you change some_code.py it will identify that it needs to run the some_test.py file as part of tests. To write that dependency, you have to put the test in the same folder as the code.
b
I don’t think that’s the case - here are the contents of my BUILD files in the two packages: `models/tests/BUILD`:
Copy code
python_test_utils()

python_tests()
`models/BUILD`:
Copy code
python_sources()
Pants does dependency inference: https://blog.pantsbuild.org/why-dependency-inference/ so you shouldn’t have to explicitly specify the dependency between
some_test.py
and
some_code.py
h
Those dependencies are typically inferred automatically
In our large (but private) Pants-using repo we had the tests live alongside the code under test:
src/python/bar/foo_test.py
tests
src/python/bar/foo.py
and so on
I like that because it makes it really easy to find the test for a piece of code
And since Pants understands deps it can make sure not to package the tests up with the code for deployment (which is historically why people have separate tests/ folders)
And TBH not that packaging up the tests is such a big deal either...
And the conftest.py goes in the
src/python/bar
or
src/python
, depending on what scope it's supposed to apply to
f
Oh yeah i definitely prefer having teh tests in it's own folder. hmm, ok i've made a misstep then. When i change foo.py and run the test-changed-since command, it tests the entire repo. an i only want it to run test_foo.py
h
Do you have an explicit dependency somewhere that could be causing that?
You can investigate your dependencies with
pants dependencies --transitive
or you can find a dependency path between two given targets with
pants paths
f
Thank you, i will try the pants path.
the dependencies look not quite right so likely to be that, i'll investigate. Thanks!
f
I have a semi-toy project where I develop with Pants, please see https://github.com/AlexTereshenkov/cheeseshop-query/. It features most widely used Pants features, perhaps you can use this as a sandbox for your explorations.
👍 1
f
Thanks Alexey, that is helpful. I'm now seeing the following command is always running all tests, even when no change to the repo has been made.
Copy code
pants --changed-since=HEAD  --changed-dependents=transitive test
When i removed the
--changed-dependents
it runs nothing as i'd expect. I'll try looking through the dependencies to see if there's anything strange in there but they looked correct to me earlier.
f
interesting! What's your
git diff
, are you sure you don't have any changes in the working tree? 😕
f
Copy code
(base3.10) matt@DGX-2:~/mlcore$ git diff
(base3.10) matt@DGX-2:~/mlcore$
f
I wonder if is your first commit in the repo or something like this so HEAD is always going to tell you there are changes (as it compares to "nothing"). But I am skeptical 😕
I assume all tests are listed with
pants --changed-since=HEAD  --changed-dependents=transitive list
goal?
f
this is on my repo rather than yours, so there are plenty of prevous commits. Yes all tests appear to be listed (there are a lot but i have no reason to believe any have been missed)
I'm going to try on your cheeseshop repo to ensure it's not something wrong with my install of pants
1
f
FWIW may be worth running
pants dependents --transitive <your-python-test-module.py>
on random files to see what kind of things they depend on
f
Sorry Alexey, i misread your suggestion earlier. I have run the
pants --changed-since=HEAD  --changed-dependents=transitive list
and i see pretty much every file in the repo. Now when remove teh transitvie, i see only one: the default lockfile.
Copy code
py39) matt@DGX-2:~/mlcore$ pants --changed-since=HEAD  list
3rdparty/python/default.lock:_default-resolve_lockfile
I guess that's because the lockfile is younger than the commit perhaps? When i generate the lockfile on the cheeshop repo and run the list or test again, it runs the tests every time. So i guess there is something about the lockfile's age?
To recreate on cheeseshop repo: 1. clone or get clean state of cheeseshop repo 2. run
pants --changed-since=HEAD  --changed-dependents=transitive list
and you will see no targets 3.
pants generate-lockfiles
4. repeat step 2 but now you will see lots of dependencies. Which means it will try to run all the tests
Is there a flag i can set to ignore the lockfile creation time from the --changed-since logic?
f
well, fundamentally, since the lockfile is different, all targets that depend on it should be considered changed. I am puzzled why you would like to ignore this fact? 😕
f
the lockfile isn't different though
👀 1
f
pants generate-lockfiles
it may be not obvious, but this command may produce a different output on subsequent runs even with no source code changes because it fetches the data from the external resources and pip may resolve the 3rd party dependencies differently
the lockfile isn't different though
oh I see, let me try to repro locally and explore!
f
i understand it might be different and i can see why you'd want to check it hadn't changed when using the --
changed-since
however if i was using this in CI for example, then i generate the lockfile before teh tests, so it would always identify the lockfile as newer i think?
f
I wouldn't recommend following this approach. You would generate a lockfile once and keep it checked in. You may regenerate the lockfiles once in a while, in whatever cadence your organization would find appropriate (e.g. weekly, monthly). Regenerating a lockfile before every tests run isn't going to be helpful 🙂
f
i thought the lockfile had to be generated after cloning the repo though? I thought the lockfile isn't committed to the repo, it has to be generated on each platform so the resolves are correct for that platform? So if i were using in CI, then i'd have to run the generate-lockfile first? Or does the test command generate it's own lockfile?
f
https://www.pantsbuild.org/docs/python-lockfiles there's no need to regenerate the lockfiles after cloning the repo. If you have conflicting requirements (you can't use a certain PyPI package on arm64, for instance), you could have multiple resolves, each having a separate lockfile. The
test
and other goals would normally use the lockfile that is already in the repo, they don't have to generate any.
does this help at all?
f
ok, so you would normally commit the lockfile to the repo?
👆 1
f
100%. You want to make sure you run your tests against the transitively pinned dependencies and every user of the repo (CI, developers, deployment) does the same
👍 1
f
ok great, for some reason i thought the lockfiles had to be generated each time. Great, that's helped immensely. Thank you! 😄
f
no problem at all! There are many moving parts and Python ecosystem is very... evolved 😄 and Pants just adds more entities to reason about (albeit with the idea to streamline the build). If it helps, you can think of a lockfile as of a
go.sum
file in Golang world. So your
go.mod
is Python requirements file with direct dependencies (optionally, with exact versions, but most often not) and
go.sum
is the resolved list of requirements with transitively pinned dependencies (with checksums)
👍 1
and https://github.com/jazzband/pip-tools#should-i-commit-requirementsin-and-requirementstxt-to-source-control if you are new to Python ecosystem - this is how many other organizations who are not using Pants or similar build system resolve their dependencies with the lockfiles being very similar to what you see in the Pants lockfiles
f
Yeah, I'm familiar with python/requirements.txt. I was under the impression that the lockfiles were more bespoke to the system generating them. i.e. a lockfile generated on a mac, might not work for an ubuntu system etc, and therefore the lockfile should be generated on each new system cloning the repo. It sounds like i had the wrong impression though, so thanks for your help 👍
h
Ah no, the lockfile is cross-platform and is intended to be checked in. Possibly we should emphasize this more in the documentation.
👍 1
f
Hi Alexey and Benjy. You were so helpful last time i thought it would be good to ask you directly: Is there anyway to include a resource from above the directory of the current BUILD file? I have two uses cases: 1. My repo needs to check git data, so needs access to the .git folder. The code that needs this is further down the directory structure, so i can't use resources or files targets. 2. I want to build a pex file that includes CUDA libraries to run torch with GPU support. I hoped i could use something like resources to pull those in as well but i know this is a huge task so probably not that simple even if i could get the directories included in the pex
h
A BUILD file can only own
sources=
that are below it in the filesystem tree, but you can have a separate BUILD file that has a
resources()
or
files()
target that owns those sources, and then the first target can have an explicit dependency on that target.
👍 1
However Pants ignores
.git
(and all other root-level dirs starting with a dot) by default (see https://www.pantsbuild.org/docs/reference-global#pants_ignore), so you'll have to futz with that option
and I'm not sure what the implications are
Re pytorch in pex, that is a huge topic that has been discussed a LOT (search this slack for details)
Probably we should gather all that info into a documentation page