<@U021C96KUGJ> (or anyone else familiar) Can someo...
# development
b
@ancient-vegetable-10556 (or anyone else familiar) Can someone help me understand what
src/python/pants/jvm/non_jvm_dependencies.py
is/does?
1
a
One moment
b
(take your time, not urgent at all, jus tpinged you because of the file history)
a
It’s for dependencies of JVM-related targets that do not produce JVM symbols
Because JVM targets have to produce build artifacts that get smooshed together into a JAR, but unrelated file dependencies should not.
b
Can you explain it to me a bit dumber? I haven't done Java since High School, and even then I only marginally knew programming in terms of coding (not the ecosystem at all). I'm asking because it's a bit "special" in the world of asset handling in Pants codebase
Why does
NoopClasspathEntryRequest
need field sets at all if they aren't used? 🤔
a
one moment
I have to re-orient myself with the rest of the JVM code, it’s been a month or two
1
🙏 1
b
Much appreciated ❤️ Trying to orient myself for all the possible cases for the
asset
unification proposal
a
OK, so the trick is in
jvm/compile.py
— there’s a method
ClasspathEntryRequestFactory.for_targets
. This is used to find an implementation that fulfils a JVM-compatible dependency. It’s necessary because there are multiple implementations for fulfilling JVM dependencies (java compiler, scala compiler, 3rd-party package resolution, etc). It figured out which implementation to use based on the fieldsets on the dependency target, and each implementation specifies which fieldsets it’s compatible with
b
And the reason this exists for only for possible file fieldsets and not resource, is because resources are actually consumed by the compiler (e.g. "embedded") and files are tossed away?
a
The problem was that if you had a
file
dependency, there’d be no implementation to fulfil its dependency, so the compile would fail. We made the decision at the time that it would be better to handle a no-op compile in a generic and extensible way than to special-cases
Yes, resource dependencies do get embedded in the final compile result.
b
Making more sense 🙂
a
see
resources.py:assemble_resources_jar
b
Secondly, why did we need to support the "generator" sources? I'm assuming in the cases where someone depended on the generator directly (same fo rthe relocated files tgt)?
a
yup
it’s true of every
ClasspathEntryRequest
implementation
and
JvmResourcesRequest
is the special-case implementation for resource, jfyi
b
😕 That is... unfortunate. It doesn't leave much room for plugin-implemented asset generators, does it? 😕
a
It does, it’s completely pluggable! (one moment, again)
See
compile.py:calculate_jvm_request_types
— each
ClasspathEntryRequest
subclass needs some way of generating a
.jar
(actually just a zip with a specific directory structure). In JVM land, generating a
.jar
is unavoidable*, so you’ll need some way to do that anyway
(*not technically true, but more than true enough for our purposes)
b
I see! OK I think I have what I need. I really appreciate the time ❤️
a
Yup! The thing you originally found was just the easiest way to make a no-op that fit within the existing plugin structure. If our design is inflexible, do let me know, we don’t want to get in the way of things being useful
b
I think in this case, that code will disappear because all assets that end up as a dependency will become resources (within reason). If the user wants to opt-out, they use dependency excludes. One question I'll have in the future will be if this PR will be useful for that resource collection: https://github.com/pantsbuild/pants/pull/15749
Also feel free to check out the asset proposal and leave comments if you got 'em: https://docs.google.com/document/d/1gdJjhfAiVymTTcV6sRmsHTvQrzgaFmXdY8-fkdFaqxc/edit?usp=sharing
a
Nope! We need support for file dependencies for JVM tests.
So that’s why there’s a distinction between
file
and
resource
dependencies already
b
Nope! We need support for file dependencies for JVM tests.
The proposal tosses (or "unifies" depending on how you wanna look at it) the distinction between
file
and
resource
and leaves the user with only one asset type:
asset
.
a
I will need to stew on that proposal for a while; the semantic difference between
file
and
resource
targets is extremely significant in JVM land.
which mostly boils down to how you access a
resource
from inside a
.jar
at runtime vs how you access a file from the filesystem at runtime
I’d be happy to talk ideas with you at some point, but the assumption that resources and files are interchangeable really doesn’t hold in JVM land
b
The one-liner takeaway is "put assets in every relevant sandbox" it's up to what's running on/in the sandbox which determines what to do with the assets
a
Yeah, I don’t think that works, because both can be necessary, and they are not semantically comparable in JVM-land
to be clear: I don’t think there is a need for two different target types, per se, but there will need to be some sort of way to distinguish between the two in
BUILD
files (possibly in how you link them to the target that needs them)
I think most of the confusion that you highlight in the doc arises from most languages not having a significant semantic difference between files and resources, but it is significant in JVM land
(I suspect the reason we have both
file
and
resource
in the first place is a holdover from V1 when Pants was much more JVM focused)
b
Yeah, the current behavior is "I, the user, am giving you a metadata hint about how I expect this to be consumed" but unfortunately that really isn't great at the "node" metadata source. As you point out, it'd be better to model this at the consumption site, however even that can mostly be avoided if we simply continually manifest assets in their relevant sandboxes (the compilation sandbox, the test sandbox, etc...)
We currently support this "transitive hint" (but not well, lol) where a
foo_source
depending on a
file
is a hint to the
foo_test
to maybe materialize it as loose in the test sandbox, but don't include it as a resource when packaging. Instead, let's avoid these little hints and treat everyone equally.
a
Yeah, I agree entirely there. I think the difference is that a
resource
sometimes needs to be considered as a true dependency
I’m imagining the case where you provide an “asset” to a JVM test — the question is whether the code under test should be opening the asset as a capital-R
Resource
, or whether it should be opening it as an item from the filesystem, or if both could be necessary
I don’t think that’s something you can correctly figure out from context
and it’s definitely not correct to include every filesystem file as a JVM resource
b
So forgive my Java ignorance. Are tests are also compiled before running? A la any other compiled language
a
Yes. Tests are compiled before running. As part of that, the entire dependency tree is also compiled, and the
Resource
targets are specified as part of the java classpath (i.e. the packaged items available to the JVM). The
File
targets are made available for opening using filesystem commands.
JVM compilers need “all” of the dependencies present at compile time, and
resource
targets are targets that are semantically significant to the compiler, the packager and to the runtime. Files are just things that code may need to be aware of or able to access at some point.
b
In this case, as it stands (and we can mold the targets to adjust if needed) all assets will act both like resources (in that they will be packaged) and files (in that they will be loose on the filesystem int he sandbox
a
Right, so it is frequently not appropriate to include files as resources — tests are the first case that comes to mind — and that is where the consumption issue becomes relevant. To be clear, I’m not against most of the proposal, just highlighting that the conceptual modelling that Pants already has is based on how the JVM sees the world, and that doesn’t hold for basically every other language. It clearly creates confusion for implementers as well as users. However, because the modelling is already good for the JVM, trying to bias towards being sensible for everything that’s not the JVM is likely to make the JVM usage less clear
b
Right, so it is frequently not appropriate to include files as resources
Why is that?
a
One use case is that if you’re inside a corporation releasing a proprietary JAR for external consumption, and you have sample data files for use with
pants test
or
pants run
which contain internal data, then you definitely don’t want that included as part of the JAR
b
In JVM-land are all resources packaged? No metadata or anything in the middle to self-select which dependencies get packaged?
(E.g. in Python you might have
<http://MANIFEST.in|MANIFEST.in>
)
a
Yes. Resources are intrinsically things that are part of the final packaged product.
And they are accessible in JVM code by asking the JVM for a handle to a resource — they are treated intrinsically differently to files.
b
Ah, then I suspect we'd need to add a field to the
java_source
target to allow for such selection. @witty-crayon-22786 sound about right? One olive branch to extend here is if consuming the resource is relatively rigid in the code, we can add JVM "asset inference" like we do in python, which will automatically determine resource v not resource
w
Ah, then I suspect we’d need to add a field to the
java_source
target to allow for such selection. @witty-crayon-22786 sound about right?
i think where we had left it in the last discussion was that we needed a
data_dependencies
field on
*_tests
to signal that you wanted loose files, but we hadn’t decided whether it needed to be transitive. haven’t caught up on the design doc, sorry
which… doesn’t make sense in this case probably. what would it mean for it to be transitive. so yea, you’d still need to be able to mark it on a
java_source
so… yea, probably.
a
I don’t think it needs to be transitive per se
and being able to mark a data dependency on any jvm source file would probably solve the problem (for tests or otherwise)
w
… yea, sorry. i guess the “is it transitive” question is instead: “should
java_sources
, which cannot themselves consume files, be able to declare a dependency on files… or is that only `*_tests`”
b
...and piggy-backing off Stu's. The reaosn we "allow" transitive file deps is the convenience of that metadata being applied to all tests of my source. Right?
w
yea.
a
There’s probably a want of describing what files might be accessed by a given
jar
when you
pants run
it, but that’s hard to track, and would make more sense to be declared at the source
and yes, you’re right about tests
b
I think JVM is special here, so I'm going to call it out in the doc. But yeah I think we want the field on
java_source
and lets call a spade a spade.
data_dependencies
is too vague. Depending on whether we think files or resources are more prevalent it'd either be (semi-bikeshed)
package-asset-dependencies
(a list of asset dependencies to package) or the converse:
dont-package-asset-dependencies
(treat these as we did "files" before)
a
describing what files are consumed by the implementing code makes a lot more sense than describing on the test that may not directly consume it
w
@bitter-ability-32190: it’s the same as Go and Python (under PyOxidizer or zipapp)
“embedded” resources vs loose files
b
@witty-crayon-22786 I think in other ecosystems there's a middleman picking and choosing which resources to embed out of the sandbox or not. We only need this for places there isn't that middleman
w
python doesn’t have that middleman either, in that we don’t require a setup.py
b
describing what files are consumed by the implementing code makes a lot more sense than describing on the test that may not directly consume it
Yes and no. The source code isn't actually consuming it though, it's just expecting it to exist in the environment its being run in. Another way of putting it is the source code has a dependency on the runtime environment, which Pants can't control. It's just convenient to declare the dependency on the source as a hint to downstream environments we ourselves create.
python doesn’t have that middleman either, in that we don’t require a setup.py
Not for PEX, but for
python_distrubution
we do*
w
and being able to mark a data dependency on any jvm source file would probably solve the problem (for tests or otherwise)
relatedly, Alonso responded with his use case: https://github.com/pantsbuild/pants/issues/15500#issuecomment-1144003041 … haven’t correlated it with the design yet.
this line:
The normal
file
behaviour was enough for that (couldn’t use
resources
because you can not run binaries from inside an archive file).
refers to the fact that a resource starts out embedded in a jar, so you need to extract it somewhere to use it as a file… that’s a fairly rare case, because 90% of the time you can open a stream and ignore the fact that it isn’t a file
so… test helper code with loose file dependencies