I have a look-ahead regex [^a-z0-9%*][a-z0-9%]{3,}(?=[^a-z0-9%*]). In my test it extracts 4 substrings from @@||imasdk.googleapis.com/js/core/bridge*.html:
|imasdk.googleapis.com/core
I need to rewrite it with 2 good-old regexes as i can't use look-aheads (not supported by regex engine). I've split it into [^a-z0-9%*][a-z0-9%]{3,} and [^a-z0-9%*] and the latter is checked for each first regex match in the substring after the match.
For some reason it extracts /bridge too as . is not listed in [^a-z0-9%*] and is found after /bridge. So how does the look-ahead works: does it have to be a full match, a substr (find result) or anything else? Does it mean every ending char is expected to be not from the set a-z0-9%* in this case?
In Rust the code looks as follows:
lazy_static! {
// WARNING: the original regex is `"[^a-z0-9%*][a-z0-9%]{3,}(?=[^a-z0-9%*])"` but Rust's regex
// does not support look-around, so we have to check it programmatically for the last match
static ref REGEX: Regex = Regex::new(r###"[^a-z0-9%*][a-z0-9%]{3,}"###).unwrap();
static ref LOOKAHEAD_REGEX: Regex = Regex::new(r###"[^a-z0-9%*]"###).unwrap();
}
let pattern_lowercase = pattern.to_lowercase();
let results = REGEX.find_iter(&pattern_lowercase);
for (is_last, each_candidate) in results.identify_last() {
let mut candidate = each_candidate.as_str();
if !is_last {
// have to simulate positive-ahead check programmatically
let ending = &pattern_lowercase[each_candidate.end()..]; // substr after the match
println!("searching in {:?}", ending);
let lookahead_match = LOOKAHEAD_REGEX.find(ending);
if lookahead_match.is_none() {
// did not find anything => look-ahead is NOT positive
println!("NO look-ahead match!");
break;
} else {
println!("found look-ahead match: {:?}", lookahead_match.unwrap().as_str());
}
}
...
test output:
"|imasdk":
searching in ".googleapis.com/js/core/bridge*.html"
found look-ahead match: "."
".googleapis":
searching in ".com/js/core/bridge*.html"
found look-ahead match: "."
".com":
searching in "/js/core/bridge*.html"
found look-ahead match: "/"
"/core":
searching in "/bridge*.html"
found look-ahead match: "/"
"/bridge":
searching in "*.html"
found look-ahead match: "."
^ here you can see /bridge is found due to following . and it's incorrect.