I’ve just had a chat with <@U0N6C2Q9F> about Java ...
# development
a
I’ve just had a chat with @fast-nail-55400 about Java dependency inference, and I think we’ve got a way forward on inference for fully-qualified type names (FQTNs); I’ll be documenting in the thread
Problem: FQTNs are easily detectable by regexp, but are ambiguous with long dereference chains, e.g.
Copy code
public void beep(ToolSetup com) {
  System.out.println(com.toolchain.you.Get.the.picture);
}
You can figure out if
com.toolchain.you.Get
is a real type if you know the real list of classes that are available on the classpath, but Pants’ approach is to not specify the entire dependency list, especially if we’re going to make Fat JARs. Proposed Solution: When resolving the lockfile for a project, we walk the
zip
index of each dependency JAR we pull down. At this point, we can note the packages that are included with each locked JAR, and save that to a metadata file. At dependency scan time, we can search for strings of the form
x.y…z
and look for prefixes that match a known package name, and include the JARs that provide that package on the classpath. This will produce an overspecified classpath (if there are package prefixes that are ambiguous with variable names, we may specify more dependency classes than are actually there), but AIUI, overspecifying a classpath is not a correctness issue, but a file size issue. Any thoughts before I commit this to a GH issue? @witty-crayon-22786 @bored-art-40741
w
using either the classpath or JDK 9 modules to do this sounds reasonable, yea. i expect that Patrick had already put some thought into it based on the discussion in https://pantsbuild.slack.com/archives/C0D7TNJHL/p1631728289220300
as mentioned there: i think that “relative imports” / FQTs are a lower priority than code that actually imports things, and would be totally fine not catching inline imports in a first version if we agree that it encourages good code hygiene to prefer imports
a
I think the key observation is that we’re probably fine with an overspecified dependency spec as long as it captures everything
w
we’re probably fine with an overspecified dependency spec as long as it captures everything
hmm… what do you mean by that?
oh, got it. referring to your comment
when in doubt we should bias toward under specified: although we do have syntax for excluding dependencies that have been inferred, i think that it is much better to get an error from a compiler that something isn’t present, then no warning at all that you are pulling in something that you don’t actually depend on
i think that it’s really important to minimize magic, and avoiding approaches that have false positives is an important part of that
1
b
I agree with Stu: false positives are much worse than false negatives
1
Explicit dependencies provide an easy to understand escape hatch for fixing a dep that dep inference missed. There is no such escape hatch for fixing a dep that inference incorrectly inferred, and it's really difficult to even detect such a situation
w
There is no such escape hatch for fixing a dep that inference incorrectly inferred
there is: we support a
!
syntax in the dependency list. but the other part is true: it’s hard to detect.
a
Right, the hard to detect bit is definitely a thing.
w
but yea: agreed.
b
Also, those false positives in my experience have a tendency to go off into really extreme directions really fast (e.g. pulling in absolutely everything as a dependency), and in that scenario it's actually pretty damaging (best case situation is you just have unnecessarily giant deploy jars)
👍 1
In that scenario too, even the
!
escape hatch is infeasible
There are many more dependencies in the world that you might need to explicitly cut than there are dependencies that you might need to explicitly add
👍 1
I'm also leaning strongly toward ignoring inline type references that aren't explicitly imported, and potentially heuristically ignoring imports that appear to be "implicit" (e.g.
import Foo.blah
where it looks like it's probably importing from a type already in scope based on the name)
There's a bunch of incremental work we could do with phased parse rounds where we add in known immediate dependencies and sources from the same package to get a really strong handle on implicit imports, but I think it's worth seeing how much demand there really is for that (weighed against the difficulty of just adding some explicit deps)
👍 1
a
So the one concern I have is with the legitimate case of importing two classes with the same name from different packages; I’ll get off this particular horse if we can come up with a good way find errors and tell people how to fix them
b
Use the FQT inline and add an explicit dep
w
right. the compiler will tell you
b
It's the same solution you'd otherwise use, and probably already have used, you just have to help dep inference a little
a
The compiler will tell you that it’s missing, but it’s the compiler and not pants that’s giving you the error, but the fix lives in Pants
w
yes, but this is a case where we degrade to “the status quo”
this isn’t a new failure mode: folks who use build systems know this one.
…which i guess is the whole reasoning behind preferring false negatives: the consequences are relatively low.
b
To be clear, I think there's a solid path forward for narrowly supporting the common case of an inline FQT using the existing infrastructure and just a bit of extra logic. I just think we should draw the line with what we have now because that doesn't seem as high priority as other items
👍 1
a
fair enough
b
Basically, if you can unambiguously tell that a given reference looks exactly like a FQT (and not just a package) that is exported by another source, then you go ahead and add that dep too
That's much more likely now that I have full type export working in the existing code
a
I think the ambiguity thing is interesting; I feel like most cases where there’ll be ambiguity between an FQT and a package name or dereference chain will tend to be degenerate, given how consistently Java people name things
b
I agree, but degenerate cases have a tendency to ruin your day when the failure mode isn't graceful
I think it's doable, just harder, and probably should be introduced as an experimental option with lots of instrumentation
a
yeah sure
b
For that matter, making sure there's good instrumentation for users for the existing dep inference I'd argue is a higher priority on its own
I'm actually more worried right now about the sort of opposite scenario that I expect to be more common and more difficult to detect: unqualified references to types in the same package
The best I have now is: you should still import those for hygiene reasons, and because we need that import for fine grained dep analysis
Or perhaps we should have a flag that automatically hairballs all package sources together for people who really want implicit access to types in the same package
a
Is there a world in which we don’t include all the types in the current level of a given package?
b
Sorry, I don't follow the question
"current level"?
a
Oh. If I have package
com.foo
, I can’t implicitly refer to types in
com.foo.bar
without importing them, right?
b
Correct, it's best to ignore the apparently hierarchy in Java packages. It's a lie
a
Yeah, I wasn’t sure if “package” was sufficiently descriptive as a term
b
I'm talking about this scenario
a
yeah, so is there a world in which we don’t pull in an entire package?
b
Yeah, that's what I have written right now
What I have now is type-level inference
So it'll depend on fine-grained sources within a package
Although it also has the analysis available to do package-level dependencies if we want
And we're still going to need to special case tests somehow there
a
ok
b
So, you have `B.java`:
Copy code
package org.pantsbuild.example;

public class B {}

class C {}
and you have `A.java`:
Copy code
package org.pantsbuild.example;

import org.pantsbuild.example.C;

public class A {
	public static void main(String[] args) throws Exception {
		C c = new C();
	}
}
These both compile. Note that A is referring to a package private type in B.java
And in fact you don't even need to import C there
a
Right
b
But if you do, the current analysis does the Right Thing, and A.java depends on B.java but not vice versa
I think this mostly solves the test issue organically, if you're willing to import types from the same package in your test
a
I don’t recall if that’s how I’d normally have written java code
b
I'm pretty sure I've written more Java for analyzing Java source than anything else, so same
My first pass at this treated Java packages as hierarchical, and while I got a pretty clean implementation, I realized the false positive failure modes were too bad
e.g. if I saw an import for
foo.bar.Baz
but nothing had the package
foo.bar
, I'd end up depending on everything that provided a package with a prefix of
foo. ...
Which when applied to
com
or
org
is disasterous
The new approach is better, and it allows for the sort of "monolithic package" approach you're suggesting if we want to go that route. In fact, it allows for that approach as a user-configurable option, but I haven't implemented that