<@U021C96KUGJ>: i expect that the native parser in...
# development
w
@ancient-vegetable-10556: i expect that the native parser in https://github.com/pantsbuild/pants/pull/12890 will be much faster
to your question about matching against known packages on the classpath though: inference cannot consume the classpath
a
Sure, I haven’t gone in and looked at what symbols that parser pulls out for (a) type declarations, and (b) method calls
w
(except for 3rdparty deps, presumably)
a
Knowing that we can’t consume the classpath, I feel like we can, at the very least, maintain a list of known package prefixes
w
yea, that is true.
https://github.com/pantsbuild/pants/pull/12890 relies on that to some degree by extracting
package
statements. i’m about to comment about it.
a
If we have a list of known package prefixes, then matching FQTs against that should be possible to a certain extent
w
possibly. after you’ve actually extracted them you can implement resolution.
but… i’m also 100% fine with those not being inferred in the medium term.
a
I belive that Maven metadata has a list of packages exported by a dependency… but if we have a lockfile in place, then we can resolve that lockfile and map out the packages in each dependency before doing source analysis
w
I belive that Maven metadata has a list of packages exported by a dependency
yea, JDK9 module mappings do (too?)
which might ease 3rdparty “export” extraction.
cc @bored-art-40741 for later
b
Yeah, so I've been thinking about this a bunch lately and have some fragments of opinions
Or at least, a bunch of rakes I'm worried we might step on
Just as an observation from spending a bunch of time trying to get all FQT references out of a single Java source: it's not easy. I believe at some point it's inherently ambiguous if you don't have the real classpath available. Prefix matching seems like it should do the job, but I also observed both Spoon and Javaparser giving me back the root prefix of a package as a potential symbol (when the actual usage was a FQT like
java.util.Date
or similar), which obviously isn't any good for dep inference, but it also isn't trivial to distinguish between that case and a "real" use like
somepackage.Foo
There are also a lot of directions we can go on this in terms of how opinionated we are, what we consider to be the source of truth, whether we assume our dep inference is always the full truth, etc--and the right direction probably depends heavily on the specific types of users we're targeting. My general inclination is that there is way too much code out there doing weird stuff for us to ever claim 100% coverage with our dep analysis, and we're therefore always going to have to maintain an escape hatch in the form of explicitly provided dependencies. So we should also design against that and think about when a particular case is rare enough that it doesn't justify going down a deep rabbit-hole, and we instead point the user at explicitly provided deps. I'm kind of leaning right now toward doing that with FQTs
Or possibly we can cover like 95% of FQTs and limit our failures to false negatives, which is even better (but we have to be careful to not generate false positives, which don't have an escape hatch as far as I'm aware)
Another direction we could potentially go is to just declare modules as The Future ™️ and just say that's the cut: if you don't use modules, you need to manually manage your Java deps in Pants; if you do, we take care of it implicitly.
I actually think this is a pretty viable approach and it means effort spent toiling on language specific parsing can instead be spent on fighting our common enemy: third party dep resolution
The code that doesn't use modules will shrink over time. That said, I have no real sense of how widely adopted modules are in the Java ecosystem right now
Another thing to keep in mind about dependency analysis that I keep having to remind myself: you always depend on the mapping of exported symbols to targets. That mapping is inherently global, so it needs to be computed quickly and cached aggressively
Also, tests: IIRC, a common pattern in Java is for test source to have the same package as the code being tested. If we don't get down to the type/symbol level of dep analysis, we're going to need to special case the source/target type of test code to make sure it doesn't get yanked into a package hairball
w
Re: tests: You don't think file-level is sufficient for that?
b
Not if you hairball to package level for dependency analysis, right?
Like if you have
A.java
,
B.java
, and
ATest.java
, all with
package foo.bar
, dep analysis with package-prefix level granularity will say that all 3 are in the same coarsened hairball
w
My general inclination is that there is way too much code out there doing weird stuff for us to ever claim 100% coverage with our dep analysis, and we're therefore always going to have to maintain an escape hatch in the form of explicitly provided dependencies. So we should also design against that and think about when a particular case is rare enough that it doesn't justify going down a deep rabbit-hole, and we instead point the user at explicitly provided deps. I'm kind of leaning right now toward doing that with FQTs
Yeah explicitly provided deps are absolutely a valid way to avoid potentially ambiguous situations and we shouldn't be afraid of them.
I'm less sure that modules are a panacea, although we should use them if they're available. If we can I'd sort of rather generate modules then require them for first party code.
Having said that, explicitly specifying your exports is good and probably necessary.
Re: tests again: I'm not sure what hairball means as a verb, heh. But I don't know why we would need to coarsen to the package level? Unless there was a legitimate cycle between library and test code...
b
Well, we need to know which sources export which types, and how to properly infer FQTs (not just imports). Which is beyond what I've implemented so far, though the former is likely not too difficult
So far, we only know which package a source declares
w
got it. right.
But yea: investing lots of time in relative imports is a lower priority than figuring out per-file declared types to avoid pulling in entire packages. Not to mention the fact that it should be much easier!