I've been wondering about the following: Today if ...
# development
h
I've been wondering about the following: Today if
python_library
X depends on a
python_distribution
Y then X effectively depends on all of the libraries Y depends on, transitively, as sources, and nothing special happens with Y itself. You could replace it with its direct deps with no change in behavior. But what the author of that dep probably intended is for X to depend on the wheel built from Y. An example of where this is useful is when you have a
python_distribution
that builds native code via a custom
setup.py
(this is now possible! See https://github.com/pantsbuild/pants/pull/12250). AFAICT today we handle this as a special case, only for running tests, via
runtime_package_dependencies
. But it seems like we should do this generically?
@hundreds-father-404 I can't remember what the thought process was behind
runtime_package_dependencies
vs just
dependencies
Well, there are two issues here: One is whether traversing deps should (at least by default) not traverse through binary deps. That problem goes away with
runtime_package_dependencies
But in that case it seems like any target could have
runtime_package_dependencies
and depending on it means depending on the artifact it creates? So at the very least what we do today for tests we should do uniformly?
h
Implementation wise, it's much easier to do via a separate field. NB that it is a semantic error to include the sources of a runtime_package_dep, and you should only include the built artifact. We would have to add lots of special casing to our generic dependencies code to filter like that UX wise, I also think it is worth the clarity to users that we are going to do something special here: build an artifact
But in that case it seems like any target could have runtime_package_dependencies and depending on it means depending on the artifact it creates?
What does it mean for a Pex to depend on a built PEX? Or some Python library code to depend on a built PEX?
a
Does that make `python_distribution`s a blackhole for dep inference?
Or just something that needs some kind of special-casing?
e
I had weakly proposed dependencies that spelled out products. Bazel had this but I think its now deprecated or removed. IE:
dependencies=['a/target:address@whl']
Where `@`'s would be registered by rules and dependencies on `@`'s would get the corresponding registered rule run against them with the product being a file(set).
Lots of handwaving there - but the idea is a general mechainsm instead of an ad-hoc mechanism like we have now for python tests.
w
my understanding of bazel is that you would affect the “type” of your dep on something via putting it in either the
deps
or
data
list. @average-vr-56795 would know better about whether the syntax still exists.
but yea, the
@
syntax got co-opted into “variants”, which eventually became multiple Params, and the syntax was dropped (although we still reserve the character)
the target API could still implement parsing to flavor deps like that… one of the surviving potential usecases would be multiple types of codegen at the destination.
(things like “which resolve am i using” and “which JDK do i want” are properties that would be set for an entire consuming target, rather than on a dep-by-dep basis, so they don’t really need
BUILD
syntax)
a
I don't think bazel has ever offered that kind of syntax exactly - I know Buck does and calls them flavours. Bazel has a couple of approaches to this - the main one is the separate attribute thing (e.g.
deps
vs
runtime_deps
); another is that you can name explicit files in some contexts (e.g. in a pkg_zip rule you can reference
:foo
or
foo_deploy.jar
and the latter will be a fat jar of transitive deps) but that only really applies when you're depending on a literal file copy, rather than wanting metadata like a classpath. A key difference between Bazel and Pants here is that in Bazel each rule pushes providers up the graph (so the dependency decides in what ways it can be consumed, and the depending rule picks one of them), whereas Pants pulls rule classes up the graph (the depender can decide to do interesting transforms itself)
e
Yes - exactly, _deploy.jar - that's what I'm referring to @average-vr-56795. @witty-crayon-22786 the
@
is a distraction - I intend none of the cross meaning with the old
@
- just a disambiguator since I lazily assumed
.
may already be allowed in address names.
a
(FWIW "flavours" got kind of terribly overloaded in Buck - they ended up becoming both "I want this particular kind of output" and also "I want to reconfigure this target" - you would use flavours both to differentiate "I want a .a" vs "I want a .so" but also "I want an x86_64" vs "I want an arm64" - and while those are similar concepts, the combinatorial explosion of them in one "address space" got very confusing (and also inefficient)
👍 1
w
re: push vs pull in the rule graph… the target API ended up being shaped somewhat similarly to bazel providers and/or the “multiple different output files” of a rule: you request a particular
Field
type for a target, and if it is not declared literally on the target, it can be computed for the target
1
so from a type perspective, there is already a facility to produce either X or Y for a target… maybe just not syntax to pull on one or the other…?
a
I quite like the idea of modelling a
python_distribution
as something which in some way pretends to be a
python_library
without any dependencies as far as things depending on it are concerned, but retains the dependency information for other contexts where that's needed... I'm not sure what exactly that would look like model-wise these days
w
but also, i think that
Fields
are still conceptually treated as inputs… even when they might be a
PythonSources
field generated from a
ProtobufSources
field (example from HEAD)
h
The same as depending on a
files
target containing the built artifact
a
Except you still want dep inference, right? Or is that a separate type that would get consumed a separate way?
w
@happy-kitchen-89482: in order for
python_library
consuming
python_distribution
to mean different things in different contexts, we either need new syntax or new attributes. we don’t have the
data
vs
deps
split, so we only have one list. if we had two attributes you’d put the
python_distribution
in your
data
list if you wanted loose files, or
deps
if the other thing.
h
I guess we have the dep inference issue today with the special case for tests
I don't think it needs to mean different things in different contexts
I think it can always mean one thing: I depend on the package produced by this package-producing thing.
What other reason could there be to depend on a package-producing thing
Dep inference is an interesting question (and one we already fail to answer, I think)
a
I think the other reason for not wanting to treat it completely opaquely is things like classpath conflict checking
h
for the motivating use-case (building native code in a custom
setup.py
) we don't have to worry about dep inference creating conflicts
a
So the wheels we're taking about are pure native code, and if they vendor any Python they do some kind of shading-like thing?
h
Or the user accepts that there may be a syspath ambiguity
I mean, it may not be a real problem, they are the exact same .py files
they just exist twice, once in the wheel and once loose, as sources
and are loaded from whichever has priority (presumably the wheel)
in fact we would have to ensure that the wheel takes priority
to avoid weird importlib introspection edge cases
a
I'm imagining e.g. some packages where there's a pure Python version, but if you're happy to pick up a native dep you can get a more optimised version, and I'm not sure how having two copies of that would go down. But I guess always picking the most locally built wheel probably works fine. But I guess it could be cleaner to be able to explicitly notice the conflict. I guess this ends up looking like the
provides
attribute on v1 JVM rules? Tagging the explicit metadata so you can handle the conflict smoothly