https://pantsbuild.org/ logo
#development
Title
# development
h

hundreds-father-404

11/18/2020, 10:49 PM
Hello, regex + unicode help appreciated if y’all have ideas. Simplified problem: I want to extract out
dir/foo.proto
and
ábč.proto
from this string:
Copy code
import "dir/foo.proto"; import "ábč.proto";
NB the unicode, it’s important that the file name can be any unicode character iiuc. This regex does not work if >1 target:
Copy code
import "(.+\.proto)";
The
.+
is too permissive and results in the first group being
dir/foo.proto"; import "ábč.proto
. But I don’t want to enumerate all valid chars - that’s not feasible with unicode. In Python,
\w
works with unicode chars, so maybe I use something like
\w+
and some of the other valid symbols like
-
and
/
?