<@U06A03HV1> (and cc <@U04S45AHA>), curious your t...
# development
@witty-crayon-22786 (and cc @enough-analyst-54434), curious your thoughts on
for building a PEX x lockfiles:
• if no lockfile, we use # requirements. You point out this isn't perfect because we miss transitive deps • if lockfile, we use # of lines. But I realize that's not good either because one req might have 10 lines due to
• w/ Pex JSON lockfiles, we actually could parse the JSON and get an exact count. But John has discouraged that because the format is not stable I'm wondering if for lockfiles, we switch to the imperfect no-lockfile heuristic of # of input requirements? Altho gr, even that doesn't work well, we only now the req_strings of the current context, which can be a subset. We'd need to parse the lockfile header to get the # of input requirements, but there's no guarantee a lockfile header is present due to manual lockfile generation So maybe this? 1. Keep requirements.txt the same 2. Parse PEX JSON, but make this fail-safe if the format changes on us. Fall back to the
, even tho that's imperfect
Altho gr, even that doesn’t work well, we only now the req_strings of the current context, which can be a subset.
consuming the lockfile for a subset should only actually build the subset, right? if so, that should be fine
…or maybe that is followup work, and we always build the whole lockfile when we consume it right now?
i don’t think that this matters a whole lot: but biasing toward heuristics that overcount is better than those that undercount. so #lines remains decent.
For 2. I'd still highly discourage siloing. Why don't you add a pex3 tool to do this since the code is already all there? The LockedResolve.resolve code, which just spits out the list of artifacts to download, takes ~17ms in my medium size jupyter-server tests.
On the other hand we simply won't have this option with other resolvers in other verticals, so it is perhaps better to not obsess on this optimization generally or even just right now.
👍 1
yea, i think that this isn’t worth worrying too much about. bias toward overcounting, and leave as a TODO.
Okay +1 to not spending too much effort on this. Any suggestions for a heuristic that over-counts? A super simple one is to still use line counts. Note that we use
when generating, which causes entries to have new lines. It will definitely overcount
yea, just line count should be fine.
👍 1