https://pantsbuild.org/ logo
#development
Title
# development
h

hundreds-father-404

04/06/2021, 12:50 AM
Anyone enjoy regex and interested in helping with this? I want to improve our dep inference for Protobuf imports so that it handles
//
comments:
The simplified regex is currently
import\s*\"(.+?\.proto)\"\s*;
, which captures the filename in
import "foo.proto";
for example I want to improve this so that if there's a preceeding
//
before
import
, then the regex does not match.
^(?!\/\/).*import\s*\"(.+?\.proto)\"\s*;
does the trick, but it does not handle this edge case:
Copy code
some proto code; // import "foo.proto";
So, I was thinking to tweak it to put
^[^\/](?!\/\/).*
before the
import
part, with the intent that you can have any characters except for
/
before that negative lookahead. But that fails if there's a single
/
in the prefix, and I only want to do that if there are two
(I'm using https://regex101.com to play with this)
a

average-vr-56795

04/06/2021, 1:27 AM
I suspect you may want to look for "only any amount of whitespace before the token import" (something like
^\s*import\s*\"(.+?\.proto)\"\s*;
) rather than "look for an import not after //"
h

hundreds-father-404

04/06/2021, 1:30 AM
So handle the case of
Copy code
// import "foo.proto";
But not these
Copy code
some code // import "foo.proto";
    // this is now stale code: import "foo.proto";
That sounds sensible, it is an improvement from the status quo even though it's not perfect. Which on that note, I'm making zero attempt to handle
/* */
multiline comments - in the worst case, you can use an
!
ignore to fix false positives
(btw my real motivation is adding dep inference for the upcoming Shell backend by looking at
source
statements: https://github.com/pantsbuild/pants/pull/11844 They share lots of similar code)
e

enough-analyst-54434

04/06/2021, 1:42 AM
There are lots of bash parsers out there, maybe use a real parser instead?: https://github.com/idank/bashlex
Ditto protobuf, it has this built in IIRC.
For protobuf I think you can just write a plugin. Those get handed the protobuf parsed and in a ... protobuf format ... that presumably includes includes.
Actually
shellcheck --format=json
gives you everything you need if you look for error code 1091 which is "can''t follow source", It has the line number, start and offset of the token being sourced.
👀 1
🚀 1
So you'd just need to run this against each file in isolation so other sourced files can;t possibly be found.
Fear the regexes when you can.
1
h

hundreds-father-404

04/06/2021, 1:54 AM
Huh, thanks! Then with Protobuf, ack on the plugin but low priority to improve. False positives aren't the end of the world
Fear the regexes when you can.
I hate regex, so works for me
e

enough-analyst-54434

04/06/2021, 1:58 AM
It seems to me the more we can lean on tools we subprocess to, the better unless perf gets in the way,
2
h

happy-kitchen-89482

04/06/2021, 4:23 AM
FWIW I think @average-vr-56795's suggestion of "detect an import only if it's preceded by nothing but whitespace to the beginning of the line" is easy even with a regex, and I assume correct, and covers all your edge cases.
I don't recall proto syntax offhand, but can a legitimate import ever be preceded by anything other than whitespace?
In fact can it even be preceded by whitespace, or does it have to be at the beginning of the line?
h

hundreds-father-404

04/06/2021, 4:25 AM
"detect an import only if it's preceded by nothing but whitespace to the beginning of the line" is easy even with a regex,
It is easy to implement, indeed. I'll land that as an improvement
I don't recall proto syntax offhand, but can a legitimate import ever be preceded by anything other than whitespace?
Yep.
Copy code
syntax = "proto3";  // We don't use proto2 anymore
h

happy-kitchen-89482

04/06/2021, 4:25 AM
Is this a real problem we've encountered in the wild though?
h

hundreds-father-404

04/06/2021, 4:26 AM
Not with Protobuf because few people are using Protobuf dep inference fwict. This is more in anticipation of adding dep inference for Shell, which was originally going to use very similar regex to do parsing. In particular, this problem messes up the workflow of commenting out an import when debugging dep inference
That worked like a charm to use Shellcheck for dep inference. Thanks John!! Only 45 lines of code for the parsing part 😎 Cool side effect that the directive
# shellcheck source=build-support/common.sh
is understood by our dep inference
😆 1