I found this JAR-managing utility in the Pants 1 t...
# development
a
I found this JAR-managing utility in the Pants 1 tree as I was looking for prior art on fat JARs. Can someone help me understand how it fit into Pants 1? (https://github.com/pantsbuild/pants/tree/1.30.x/src/java/org/pantsbuild/tools/jar)
e
IIRC this was for creating fat jars (deploy jars / executable jars). The main trick that was important for large binaries was contributed by Eric Ayers IIRC from Square and is here: https://github.com/pantsbuild/pants/blob/e5987065372b1a617fc13ae53b0cab5ef9bbf098/src/java/org/pantsbuild/tools/jar/JarBuilder.java#L1043-L1045
The other tool that will be import is the jar shader which we used for shading tool jars. That, though used a clone / fork of jarjar which ran on an old version of ASM. So that would need a bit of work to resurrect.
In particular, tools like the junit runner / scalatest runner, etc. This was needed to avoid classpath conflicts with the underlying code being tested, linted etc. Lots of mismatched deps, a common one being Guava. Tool ran on X, code used Y and there was no passing over API boundaries; so no need for the conflict -> thus shading did the trick.
a
@enough-analyst-54434 the good news is that I’m actually looking into fat JARs at the moment
e
OK. Yeah, we probably can / should re-surrect the java tools wholesale. I think there were just 3, junit runner, fatjar maker, shader.
a
I think our aim was to at least get a tool in place in pants 2, and focus on shading when we see the need
e
Although , I did see junit 5 fly by, so maybe the junit runner not needed.
a
yeah, we’re using JUnit 5's console runner at the moment in P2, but we obviously haven’t tested that in the extreme yet
e
IIRC that tool just offered parallelism which we should get from the v2 engine just fine.
a
The Fat JAR maker is definitely something I don’t see a lot of obvious prior art for, but I haven’t been able to find the corresponding hooks into pants itself (mostly due to not knowing how Pants 1 is laid out/github’s code search being useless on code branches)
e
Yeah, I suspect the shading need will be about a couple months after people actually start using things. That was how long it took to hit classpath conflicts in the past.
Well the hooks probably won't be too relevant since it was just a main to run even back then. So the hooks are the CLI args it supports.
Its out on maven central, so you can run it quick and see. maybe even use as-is?
a
oh huh
e
Pants dogfooded itself to publish its java tools.
a
I see
org.pantsbuild.jarjar
Oh huh!
well that’s handy
e
The jarjar will almost certainly not work since it can only handle X bytecode, but worth a spin I guess.
a
X?
w
also, fwiw: i never saw a side-by-side comparison of the performance difference of the optimization in https://github.com/pantsbuild/pants/blob/e5987065372b1a617fc13ae53b0cab5ef9bbf098/src/java/org/pantsbuild/tools/jar/JarBuilder.java#L1043-L1045 … but apparently
zip
has native support for concatenation:
Copy code
cat input.zip.* > temp.zip
zip -FF temp.zip --out full.zip
…and it would be nice to do something dumb until we know we need the custom code
e
Can't remember the version
a
oh right
that makes sense
w
(an issue with the concatenation approach in the medium term is that it wouldn’t support anything more clever than “last file with a particular name wins”… which is generally a fine default)
a
thanks @witty-crayon-22786! Yeah, concatenation is a thing, as long as there’s no class name conflicts in there
e
Although, to be fair, that was deemed not a fine silent default at Twitter where alot of work went into logging duplicates warnings and etc.
w
well, even then: last item wins is fine in general. it’s nice to be able to warn/error for it though
a
Right, I suspect that’s what we’ll do until we support shading
e
We never supported shading for this purpose, fat jars, just for tools as a whole.
w
but warning/erroring during concatenation doesn’t really make the most sense to me, since the conflict is a potential issue anytime you consume that classpath, not just when building a fat jar.
a
I’ve been looking around the internets to see what the state of fat jar assembly has been, and there’s been a bunch of “make jars that contain jars” suggestions
so seeing that we have an in-house tool seemed handy
e
If the tools we wrote then don't make sense now, we were deluded then ... ~roughly. So a few hours at least working the olds tools hard seems in order.
w
@enough-analyst-54434: yea, worth trying it out probably. lots of stuff lands without benchmarks.
i’m not sure whether resuming maintaining our own forks of JVM tools is inevitable, but… it would be great not to if we don’t need to.
e
We forked 1 jvm tool - jarjar (it acually died). The other 2 were tools that did not exist except buried in other build tools, like maven.
But agreed - clearly we all agree I'd hope for all tools in all languages.
This is the most useful place to look at whys: https://github.com/twitter-archive/commons/blame/4f26f742c997c64758d172aa203873b105d13860/src/java/com/twitter/common/jar/tool/JarBuilder.java That reminded me that CONCAT was a thing for service files, which was common enough. So you can't pick a serivce file, you have to merge them to get all the registered services provided by your N jars for any code using JDK services for plugins.
a
that makes sense
e
Its massive.
w
Wow, yea. Comparing to
zip -FF
would be the most interesting bit... because if the -FF pass is mostly copying and just appending a new index, it could be pretty snappy too.
Although how soon we need CONCAT is also a factor.
f
I’m catching up. We should punt on jar shading until some months from now. It doesn’t need to be in v1 of fat jar packaging.
the same with some of the other v1 features: custom manifest files etc. except to the extent needed to be able to use the packaged fat jar
b
Hey, we weren't deluded back then! It's not like any of us were crazy enough to write a custom classloader, right?
🎢 1
a
NARRATOR: …………
OK, I was able to make the
jartool
successfully produce a fat JAR on the command line for a project with nontrivial dependencies
w
nice. if you have time to compare to
zip -FF
, that would be handy. because i can imagine how to cobble this together (even CONCAT) purely with unix tools, and it wouldn’t really be that bad
a
one moment
@witty-crayon-22786 `cat`ting together all of the java files and then running the unix
zip -FF
tool worked just fine. The
jar
that Gradle popped out didn’t include a
main
attribute in the manifest, so the jar was’t runnable, but it was possible to specify the fat jar on the classpath and then invoke the
main
by name
For real-world use, we’d want to test for cases where there are filename collisions
(I remember having
zip
files that span multiple floppy disks, so re-using this functionality is mildly amusing)
w
Benchmarking them side by side might be good, but might be challenging without nailgun
a
My guess is that
cat; zip
will be faster, if only because it doesn’t recompress — it’s just reading files and outputting a new, correct index at the end of the file
w
the point of the
jartool
optimization at the head of this thread was to do the exact same thing, i think
a
oh right
So do we have a preference absent benchmarking? The existing
jartool
almost certainly has the better per-file behaviour;
zip -FF
has a quite verbose output which doesn’t seem to detect collisions
w
um… i think that my hope is that we don’t have to maintain custom JVM code, although as mentioned above, it may be inevitable.
i could imagine landing a first version that used
zip -FF
, with a note on switching to
jartool
if needed… @fast-nail-55400 likely has the best sense of what we need in a first version.
a
reasonable
w
(but CONCAT support could be done by literally extracting all copies of the colliding file into a new file, and tacking that on in a single-entry zip to the
zip -FF
)
a
that makes sense
as long as we have the zip indices in advance to know where the clashes are (which we can do in Python!)
w
re: Python: sortof… not without slurping the file into memory. but you could do it in the sandbox with a python process. probably better to do it with
unzip $file $innerfile
a
Python’s
zipfile
module doesn’t seek to the end of the file like a normal zip utility?
w
`@rule`s don’t have access to loose files on disk… because rather than being loose, they’re in a database.
a
oh sure, and I presume it’s difficult to seek through those files in a random-access fashion?
(zip indices are pretty easy to spot)
w
sortof? stuff will be sequential in the DB. it’s more that `@rule`s are not intended to directly access large files. you can load files into memory with
DigestContents
, but for large ones you’d want to put them in a sandbox and then run an external process on them instead.
a
OK sure
w
but there is not an
@rule
API for checking a
Digest
out somewhere on disk, for example… only via a
Process
a
ok