Curious on opinions about organizing dependencies ...
# general
h
Curious on opinions about organizing dependencies like files for testdata. Let's say I have a bunch of
*.py
and
*_test.py
modules under
src/app
that I'm working with.
tailor
will conveniently make me a BUILD file with
python_sources
and
python_tests
declared that expands everything for me. Let's say I have one test in there,
some_test.py
that needs files stored in
src/app/testdata
. My suspicion is that if I were to add something like
Copy code
files(name="testdata", sources=["testdata/**"])
to the BUILD file and change
python_tests()
to
python_tests(dependencies=[":testdata"])
, any change to what is in
testdata
would invalidate a bunch of caching and cause all the tests to get rerun even though I only needed it in
some_test.py
.
I can do
Copy code
python_tests(overrides={'some_test.py': {'dependencies': [':testdata']}})
to isolate it, but feels a little clunky after doing it a couple times.
I don't think I can write my own
python_test
target for just that file as it will cause a BUILD error for duplicate targets. Maybe I'm misremembering what I have already tried.
h
any change to what is in testdata would invalidate a bunch of caching and cause all the tests to get rerun even though I only needed it in some_test.py.
That is correct.
python_tests(overrides={'some_test.py': {'dependencies': [':testdata']}})
This is indeed the most precise way to do things. If you need it for multiple files, you'd use
('some_test.py', 'another_test.py'):
as the key
I don't think I can write my own python_test target for just that file as it will cause a BUILD error for duplicate targets.
You could, but you're right you would have two targets. This shouldn't cause Pants to crash, but it will cause Pants to run the same tests twice because it thinks that you have two separate metadatas for the same file The way to avoid that is to update the
sources
of the
python_tests
with
'!some_test.py'
. But that's clunky to do and imo an antipattern, hence
overrides
It's not "wrong" per se to apply the metadata to everything generated by the
python_tests
target...but it makes your caching less useful. We encourage liberally using
overrides
, but it's not a requirement
h
Got it. I could see how trying to do more fancy introspection with
python_tests
based on
python_test
targets already written out would be difficult and cause a lot of annoying behavior. This seems like a good middle ground.
Yeah, we're in desperate need of the build caching. Our build system has been getting hammered having to rerun heavy
pip install
tasks followed by ~1000 unit tests every time.
Thanks for confirming my suspicions/ideas!
h
Oh!! https://github.com/pantsbuild/pants/pull/14049 will make you happy 🙂 We will hopefully very soon infer the
files
. And that inference will happen at a precise level, that
some_test.py
will infer it depends specifically on
data.json
but not
logo.png
for example. That inference won't be perfect, but hopefully could detect this case
👀 2
Btw, this thread captures very well the underlying motivation and philosophy for Pants v2! After 10 years with Pants v1, we realized a very strong tension between precise metadata vs. ergonomics & maintainability With Pants v1 and (still) Bazel, you have to declare all dependencies explicitly. Also there was no notion of generated targets.
python_tests
is the entire atomic thing, you can't break it down into more precise subparts. So, to get full precision and accuracy, you would need an explicit
python_tests
target per file, each with their own dependencies hardcoded Our mission was & still is to get you file-level precision without the boilerplate hell
❤️ 1