Hello, regex + unicode help appreciated if y’all h...
# development
h
Hello, regex + unicode help appreciated if y’all have ideas. Simplified problem: I want to extract out
dir/foo.proto
and
ábč.proto
from this string:
Copy code
import "dir/foo.proto"; import "ábč.proto";
NB the unicode, it’s important that the file name can be any unicode character iiuc. This regex does not work if >1 target:
Copy code
import "(.+\.proto)";
The
.+
is too permissive and results in the first group being
dir/foo.proto"; import "ábč.proto
. But I don’t want to enumerate all valid chars - that’s not feasible with unicode. In Python,
\w
works with unicode chars, so maybe I use something like
\w+
and some of the other valid symbols like
-
and
/
?