we also have the jarjar tool that shades jvm tools...
# pex
a
we also have the jarjar tool that shades jvm tools for pants. if this is itself a useful feature, we could consider making it a separate project in itself, especially if it would work to deduplicate directory trees as well as the contents of pex files.
e
The current collection of packaging / pyparsing distributions vendored in pex is not uniform, all different versions. There would be no sane way to dedup.
I'm confused I guess by your comment Danny. jarjar is a google tool, its not ours, we just cloned their repo when they yanked it from google code. Using jarjar on python is not doable in any way that uses any existing jarjar code (hopefully you didn't mean this). The pex shading equivalent that is its vendoring system could be broken out as a seperate generally usefuil project however (I think you meant this). That I think makes sense to do iff Pants consumes the tool to provide shaded python binary tools. I've advocated for this in the past.
a
(I think you meant this)
that's correct
e
OK. If you look at pex vendored code, the issue here on the ~dups is recursive vendoring. ~All the pypa projects vendor themselves.
a
that's what i was thinking as well
if the deduping occurred (in my head), it could be to very simply scan for setup.py files/etc and require a specific dist to be pointed to whenever it finds a duplicate of any package
e
But then why did you bring up dedeup? These are all different versions / there is no sane way to even know that in the vendoring code that I can tell. You could only look from the outside, notice the recursive vendoring and have the end product use the other vendored code recursive vendor packages directly.
and require a specific dist to be pointed to whenever it finds a duplicate of any package
How do you know the symbols work? You're now just ignoring requested versions.
a
hm, yes. i was only thinking so far as "dedup the directories" not "patch the symbols"
e
OK - well the symbols are important - these are imported after all.
a
getting the right versions part could be addressed by scanning versions from setup.py metadata as well. but if people vendor code and have changes to it, but don't change the version number in their vendored directory, that strategy falls apart. it seems hard to solve this problem on this level.
e
Yes, exactly.
The easier way to avoid all these dup imports is to avoid the fake reason for them in this case, stop pkg_resources.working_set from scanning the full sys.path.
👍 1
a
oh, oops. i seem to have missed that part lol
yes, i agree with that