Contents
The Regular Expression is an invaluable tool for programming.
However, try to use it in earnest, exploiting most if not all of its various features, and you may occasionally run into browser differences and other traps for the unwary.
lastIndex property
You will be familiar with the exec
function (Section 15.10.6.2 RegExp.prototype.exec(string)
):-
Performs a regular expression match of string against the regular expression and returns an Array object containing the results of the match, or null if the string did not match.
The Array
object returned has the following properties:-
0 | The matched sub-string. |
1 .. |
The values of |
length | The number of array entries. |
index | The start of the matched sub-string. |
input | The string against which the exec was run. |
[lastIndex] | The index of the character next following the match sub-string. This is non-standard and only supported by Internet Explorer: so do not use this. |
The RegExp
object instance (re
in the example below) is also updated and has the following properties :-
source | The regular expression pattern. |
global | True if "g" was used in the pattern's switches /ab/g |
ignoreCase | True if "i" was used in the pattern's switches /ab/i |
multiline | True if "m" was used in the pattern's switches /ab/m |
lastIndex | The string position at which to start the search. If the "g" (global) switch is set then this will be updated with the string position after the last match. Unlike the lastIndex property on the Array returned above this property is supported by all browsers. |
var str = "abcabcabc";
var re = /(ab)(ca)/g;
var match = re.exec(str);
for (var propertyName in match)
{
alert(propertyName + " " + match[propertyName]);
}
alert("source" + " " + re.source);
alert("global" + " " + re.global);
alert("ignoreCase" + " " + re.ignoreCase);
alert("multiline" + " " + re.multiline);
alert("lastIndex" + " " + re.lastIndex);
In order to use the lastIndex property As noted above, lastIndex requires the "g" flag to be set.
If you are going to use the lastIndex
property, then it must be on the RegExp
instance and not the Array
returned from exec
.
Note that the RegExp.$0 ... RegExp.$9
properties are being depreciated, so try to stop using these.
Object Reuse
What do you think the following code will do? Will it alert 1, 4, 7
or 1, 1, 1
?
var str = "1ab2ab3ab";
find(str);
find(str);
find(str);
function find(
input /*: String*/
)
{
var re = /ab/gi;
// re.lastIndex = 0;
var match = re.exec(input);
if (!match)
{
alert("Not Found");
}
alert(match.index);
}
For Mozilla and Firefox: 1, 4, 7
For Internet Explorer: 1, 1, 1
It seems that once a RegExp
is created in Mozilla and Firefox, it does not reset.
The solution is to use the line escaped above re.lastIndex = 0
, if you want the find always to start from the beginning.
This is not a problem with the String.prototype.replace
function, where the lastIndex
property is ignored:-
var str = "1ab2ab3ab";
replaceAB("1ab2ab3ab");
replaceAB("1ab2ab3ab");
replaceAB("1ab2ab3ab");
function replaceAB(
input /*: String*/
)
{
var re = /ab/gi;
re.lastIndex = 2;
alert(input.replace(re, "cd"));
}
Alerts 1cd2cd3cd
for all browsers.
Parenthesis Matches
As you will be aware, you can capture sub-matches in parentheses.
Thus:-
var str = "abcabcabc";
var re = /(ab)(ca)/g;
var match = re.exec(str);
alert(match[1]);
alert(match[2]);
Will alert ab
and then ca
.
But what if a parenthesis has a *
quantifier, so that it can match 0 or more times
, and it matches 0 times
?
For Internet Explorer the Array
will contain the empty string ""
, but for Opera and Mozilla, the value undefined
.
Thus:-
var str = "b";
var re = /(b+)([^a])*/;
var match = re.exec(str);
alert("'" + match[1] + "' " + typeof match[1]);
alert("'" + match[2] + "' " + typeof match[2]);
// COMPARED TO
var str = "b";
var re = /(b+)([^a]*)/;
var match = re.exec(str);
alert("'" + match[1] + "' " + typeof match[1]);
alert("'" + match[2] + "' " + typeof match[2]);
Replacement Function
15.5.4.11 String.prototype.replace (searchValue, replaceValue)
allows a function to be provided as the replaceValue:-
If replaceValue is a function, then for each matched substring, call the function with the following m + 3 arguments. Argument 1 is the substring that matched. If searchValue is a regular expression, the next m arguments are all of the captures in the MatchResult (see 15.10.2.1). Argument m + 2 is the offset within string where the match occurred, and argument m + 3 is string. The result is a string value derived from the original input by replacing each matched substring with the corresponding return value of the function call, converted to a string if need be.
Thus:-
var str = "abcde|abcde";
var re = /(a)(bc)(de)/g;
alert(str.replace(re, replacementFunction));
function replacementFunction(
matchedSubstring /*: String*/,
paren1 /*: String*/,
paren2 /*: String*/,
paren3 /*: String*/,
matchOffset /*: Number*/,
inputString /*: String*/
) /*: String*/
{
return paren3 + paren2 + paren1 + matchOffset;
}
Versions of Opera older than 9 do not support this.
Safari Bug
I have recently noted the following entry concerning a Safari bug:-
It seems to involve a quantifier on a parenthesis:-
var a = [];
for (var i = 0; i < 1000; i++)
{
a[i] = "abcdefghi";
}
var string = a.join("");
alert(/(.)+/.test(string));
According to the entry, this crashes Safari when string's length is bigger than about 7000 characters.
My Exec Function
To give myself some extra control, I use the following function:-
var input /*: String*/ = "123abc123def123";
var re /*: RegExp*/ = /(abc)*(123)/;
var result /*: Array*/ = [];
var overflow /*: int*/ = 10;
var str /*: String*/ = "";
while (overflow--)
{
result = RegExp_Exec(re, input, result.lastIndex);
if (result === null)
{
break;
}
str = "";
str += "0 :\t\t" + result[0] + "\r\n";
str += "1 :\t\t" + result[1] + " " + typeof result[1] + "\r\n";
str += "2 :\t\t" + result[2] + " " + typeof result[2] + "\r\n";
str += "index :\t\t" + result.index + "\r\n";
str += "lastIndex :\t" + result.lastIndex + "\r\n";
alert(str);
}
/*
*
* (JavaScript Function)
*
* RegExp_Exec
*
* June 2007
*
* A function to enhance the RegExp.prototype.exec method
*
*
* Written by Julian Turner 2007
* Copyright Free
*
* @param (RegExp) re
* RegExp instance
*
* @param (String) input
* The string to be searched
*
* @param (Number) startIndex OPTIONAL
* input character to start from (and including).
*
* @return (Array)
* The array returned by RegExp.prototyp.exec
* but with 1..9 "undefined" converted to empty string
* and a "lastIndex" property, even if "g" switch not set
*/
function RegExp_Exec(
re /*: RegExp*/,
input /*: String*/,
startIndex /*: int*/
) /*: Array*/
{
if (typeof startIndex == "undefined")
{
startIndex = 0;
}
if (re.global)
{
re.lastIndex = startIndex;
}
else
{
input = input.substring(startIndex, input.length);
}
var result /*: Array*/ = re.exec(input);
if (result === null)
{
return result;
}
for (var i /*: int*/ = 0; i < result.length; i++)
{
if (typeof result[i] === "undefined")
{
result[i] = "";
}
}
if (re.global)
{
result.lastIndex = re.lastIndex;
}
else
{
result.index = startIndex + result.index;
result.lastIndex = result.index + result[0].length;
}
return result;
}
Sorry, comments have been suspended. Too much offensive comment spam is causing the site to be blocked by firewalls (which ironically therefore defeats the point of posting spam in the first place!). I don't get that many comments anyway, so I am going to look at a better way of managing the comment spam before reinstating the comments.