Proposal: Elide `file` and just use `resource` e...
# development
b
Proposal: Elide
file
and just use
resource
everywhere. Google Doc: https://docs.google.com/document/d/1pGib2ZJiyltfij0aZMm81S2XbATa8s6_ghuLnf2Xr_k/edit?usp=sharing
🚀 2
➕ 1
❤️ 2
Shoutout to @hundreds-father-404 for the help.
❤️ 1
I'm working on the code changes now to serve as a PoC
The PoC is yielding all sorts of fruit. Lots of improvements in behavior
h
@happy-kitchen-89482 @witty-crayon-22786 this has my unequivocal blessing, and Josh has a proof of concept working. But needs to be split up into prefactors to derisk review Before doing that, would be helpful to get your blessing when either of you get a chance
w
yea, will take a look this afternoon
✅ 1
🙌 2
thank you!
h
Looking now
🙌 1
✅ 1
b
I think @enough-analyst-54434 had thoughts from our thread as well
OK I think I'm going to revamp this one into a V2 to better capture the intent, and change the proposal a smidge. Prose isn't my strong suit, so thanks for indulging me 🙂
👍 2
OK V2: https://docs.google.com/document/d/1gdJjhfAiVymTTcV6sRmsHTvQrzgaFmXdY8-fkdFaqxc Now primarily focused less on the technical difficulties with 2 assets types, and focused on the core of the issue: there's only one type of asset: assets. Trying to force a choice between two types based on runtime behavior of code has failed us and our users. Also, both
file
and
resource
will be deprecated for a single unambiguous type
asset
.
💯 1
@witty-crayon-22786 @happy-kitchen-89482 @hundreds-father-404 @enough-analyst-54434 please take another look. Also @curved-television-6568 you might have an opinion too 😉
🙂 1
👀 1
h
Thanks! Proposal looks good to me, I like the edit to call it
asset
and clarify what's going in: this is about not forcing specific functionality into a target type
✅ 1
🙌 1
b
I actually have a real-world use-case of relocating a
file
sitting outside our main code tree into our main code tree as a
resource
. It's impossible today, but would be trivially supported by
relocated_assets
in the propsal.
➕ 1
c
Oh boy. I feel like a single
assets
target ought to be able to cover the use cases for
files
and
resources
, but I see all the nuances presented in the comments in that google doc, it makes my (granted, somewhat tired 🥱 ) head spin. 😛
w
talked with Eric briefly on another topic, and one thing that occurred to us was that one potential option would be to say “we don’t support distinguishing between a resource/pkg_data/Classloader loaded file and a loose file: you’ll always get both”
h
Concrete example with a
go_package
depending on an `asset`: • at compilation time, we include the asset in the sandbox. Go will need it if the user is using
embed
. Otherwise, it will just be ignored by the compiler • at test time, we include the asset in the sandbox. The user might access it via Filesystem APIs. Otherwise, ignored
b
That's kind of another way of slicing the proposal. An asset is simply a declaration of a thing, and you specify dependencies on that thing. How that thing gets packaged or copied or materialized is in the details of the action taking place, not in the details of how it is declared
w
looses flexibility for test targets (i can’t avoid both packing a huge file into my app and placing it in the sandbox), but probably wouldn’t have any non-performance implications
➕ 1
h
Interesting point w/ performance of huge assets. Would immutable_digests make any difference?
w
i don’t think so…? if you’re testing a pyoxidizer or go binary, then you would be embedding an X GB file into a binary
b
i can’t avoid both packing a huge file into my app and placing it in the sandbox
Is that true? would the asset not be a dependency of the test, and therefore be place din the sandbox?
w
@bitter-ability-32190: if you don’t have a distinction between types, and always place it in both places via https://pantsbuild.slack.com/archives/C0D7TNJHL/p1654107679405549?thread_ts=1653583739.420959&cid=C0D7TNJHL then we wouldn’t know to “only” put it in the sandbox
b
WHy would it be in "both" though if it isn't a dependency of the foo_source? Just the test?
w
either way probably… depends whether a test pulls in transitive or only direct dependencies. but can trigger the case either way.
but i guess you’re saying that a test would only be able to get a direct dependency on an
asset
as a loose file? i.e., a test target depending on an
asset
would always get it as loose?
h
foo_source
Go only has a single target type:
go_package
. So when you add a dep to it, it might be because the "source" code uses the embed (resources) API at compilation time. Or it could be because tests are reading it as a loose file w/ Filesystem API. By getting rid of
file
vs
resource
, Pants loses the ability to distinguish intent Other than the performance concern, I think That Is Just Fine though
b
Or it could be because tests are reading it as a loose file w/ Filesystem API.
Help me understand why it'd be a dependency of the
go_package
if it is a test-time dependency
h
Because the
go_package
includes both the source files & test files. There is no
go_tests
, it is just
go_package
w
can embed up to 2gb in a go binary: https://groups.google.com/g/golang-nuts/c/jFKGLbTv2XQ
👍 1
b
Ahhh ok so there IS a case where one asset can be eithe rpackaged or sandbox filed. In that case I think both proposals make sense! 1. Only have one asset type: assets. Because to hit the point home, we shouldn't be defining how an asset is used at runtime by forcing the user to classify the asset at metadata time 2. In places where the usage is ambiguous (should hopefully be be rare. I can only think of tests which themselves must be compiled). We employ Stu's "edge metadata" idea. Allow users to choose which asset is loose vs packaged in the test binary
Which we can tackle 2 separately
w
with the difference from bazel being that it would only be in cases where it was ambiguous?
b
coke
w
part of the reason why i think that bazel might have recursive
data
dependencies is for requests like this: https://github.com/pantsbuild/pants/issues/15500 … i need to better understand why Alonso wanted transitive
file
dependencies there.
The only alternative now is making the
file
dependencies being direct, which is not optimal as it forces downstream targets knowing about all of their upstream potential
file
dependencies.
will comment.
b
Yeah transitive assets is an interesting case. Perhaps its the case that we dont inherit transitive assets in tests, as those should be packaged with their sources (in the case of Python we inherit them in the sandbox but not the target?)
For that ticket it seems like a cheating shortcut to include a dependency on the source so that all test environments inherit the asset. But thats my 2c
w
For that ticket it seems like a cheating shortcut to include a dependency on the source so that all test environments inherit the asset.
i mean, that’s what the distinction between
resources
and
files
is
but i don’t think i understand what syntax you’re thinking of there.
b
It's arguably very nuanced to say "all environments you test me in requires this specific loose file in the sandbox, therefore I'll add it to the source and expect the test environments to inherit the dep transitvely"
w
it can be a thing if its library code (rather than the very top level of your app) which loads the loose file. although libraries expecting files in particular places is smelly
👃 1
b
Exactly! Pants has some control over the environment we test your code in, but no control over the environment your code is deployed in. Having metadata provide weak hints is confusing and misleading. Then we end up throwing away your dependency when packaged, or not knowing how to transitively consume it because your weak hint doesn't jive with the current action
w
sure. but the symmetry is supposed to be between a test and prod… it’s just not between
deploy_jar
,
go_binary
, or
pex_binary
and a test … rather, between an `archive`/`docker_image` and a test. but i think that that is orthogonal to the transitive-or-not discussion
b
Ohh you're saying that if we have an archive which depends on a
pex_binary
, when we build the archive we throw away the files for the
pex_binary
but then transitvely inherit them back for the archive itself?! Talk about nuanced 😵
And now I think I understand how you see the world. I do think transitive dependencies is root of the evil here and is extremely relevant to the discussion. The distinction between the two asset types breaks down when you think about a singular target because most singular targets can only consume one asset "type". But when you blend in transitive dependencies it gets messy because a target which can only logically consume asset type A might be depended on by another which supports B, or A&B. Then what? We need a hint to know what the expectation is.
👍 1
w
transitive dependencies are one aspect, yea. the other is some consuming targets actually caring about the distinction between the types
so those seem like the two big open questions. Alonso is in UTC i think, so should get more info on that one tomorrow.
oh, related to:
the other is some consuming targets actually caring about the distinction between the types
&&
In places where the usage is ambiguous (should hopefully be be rare. I can only think of tests which themselves must be compiled).
…we have https://www.pantsbuild.org/docs/reference-python_tests#coderuntime_package_dependenciescode already as an “edge metadata” case.
…which basically amounts to a “give me `files`” dependency.
b
The world is clearer. I think one asset type is ideal, as it is a better representation of the intent of the user (here is an asset, how it is consumed will be called out at the consumption site(*) as trying to tell you that here is not always ideal or possible). Then the problem is truly, "how do we let the user tell us their intent". This also simplifies Pants code a lot because on the asset-generation side there's no duplication. On the asset consumption side theres no ambiguity on what the user desired because they told us. We no longer have to guess or make decisions for them
w
to be clear:
On the asset consumption side theres no ambiguity on what the user desired because they told us.
…is already supposed to be the case. to the extent that we currently have unclear docs on this subject, we probably still will… but perhaps being contextualized on a target by target basis would help.
b
I think it's ambiguous because being told a
python_source
depends on a
file
is meaningless (especially when packaging) unless you consider a
python_test
or an
archive
which is depended on by a
pex_binary
.
w
maybe… depends whether the
data_dependency
equivalent is transitive
h
(this convo might be better as a video conference btw. so much nuance here)
b
Well for the most part I got what I needed 😛 which was to understand better where that nuance is and why it isn't accounted for by using one asset type
How we solve that going forward might be worth a call though 😉
OK I think we can roll this forward with Eric's way of slicing it ("it's always in the sandbox"). It's a re-lensing of the proposal so I'll work with them on re-phrasing the docs as such. Hopefully the nuance of
runtime_dependencies
can be avoided and we can enjoy a simple target harmony. I have thoughts on our rollout/deprecation too, so stay tuned
👍 1
OK, I've responded to comments (highlighting that there's some misconceptions about how files get consumed in docker_image and archive), added a section about another way of thinking about this being that the asset simply is in all sandboxes where file/resource were. I also changed the deprecation/rollout plan to us adding asset as a new type, and not touching file/resource. It's gonna be painful for a release to support 3 asset types, but we can pull the plug soon after. This ensures we don't' knock over the current house of cards, and makes incremental PRs easy. I also moved it into our shared folder.
👍 2
Gahh, I keep forgetting about JVM's Pants-metadata-is-the-only-metadata case. We'll maybe wanna have a short meeting to find which idea we like best (or hate least)
👍 1
h
Stu was open to the idea of a call together
w
yea, still think this might be worthwhile. can get a Doodle out to schedule it if you’re interested.
…sounds like folks are. i’ll do that.
✅ 1
🙌 1
b
Much appreciated!
w
@hundreds-father-404, @happy-kitchen-89482, @ancient-vegetable-10556, @bitter-ability-32190: https://doodle.com/meeting/participate/id/dNx9l8La (and anyone else who is interested)
✅ 1
!!! … this is potentially related to scoped dependencies. https://github.com/pantsbuild/pants/issues/12794
i had forgotten that that was the last time “producer decides” vs “consumer decides” (aka node vs edge) metadata had come up.
h
It might be good to have @enough-analyst-54434 participate
👍 1
w
and @curved-television-6568: no pressure to attend. but you had had thoughts on the scoped dependencies ticket.
👍 1
booked this for tomorrow at noon PT: let me know if you’d like an invite
h
Any chance we could do it earlier, like 11 am PT? I would like to join, but will have to leave for the airport 15 minutes in. (I can always do voice call from Uber)
w
i can move my other meeting if an hour earlier works for @happy-kitchen-89482 and @bitter-ability-32190
b
👍
h
WFM
b
Gonna be a smidge late. Also sorry ahead of time, gonna be slurping on lunch 😂