Regular
Expressions are about as elegant as a pig on a bicycle. Using a regular
expression feels like resorting to machine code when all those patterns
we're taught to love just aren't up to the job. Which, I suppose, is
also a reason to like them. They have a brute force directness, free
from pattern politics and endless analysis.
And they work. Eventually.
If the JavaScript Regular Expressions API makes your head spin then
this might be for you. I'll document the basics and demonstrate how you
might use them to full effect.
For the sake of brevity (not to mention my own lack of regex
proficiency) I won't discuss the syntax of the expressions themselves.
Suffice to say, JavaScript regEx syntax is Perl based. There are many
excellent online resources for this, as well as some nice online RegEx testers.
The RegExp object
RegExp is a global object which serves three purposes:-
1) It's a constructor function for creating new instances of Regular Expressions...
It takes a regEx literal expression as its argument. As with
strings, in regex you can drop the constructor syntax and just specify
the literal on it own. RegEx literals are delimited by the / symbol
instead of quotes.
1 | var a = new RegExp(/\b\w{4}\w\b/g); |
2) It aggregates a set of global (static) properties reflecting the most recent regex match...
leftContext, the text to the left of the most recent match
rightContext, text to the right of the most recent match
lastMatch, the most recently matched text
lastParen, the text matched by the last parenthezised subexpression
$n, the text matched by the nth parenthezised groups (up to n==9)
1 | "(penalty)Lampard, Frank(1-0)" .match(/\b([\w]+),\s?([\w]+)/g); |
...and 2 variables that will be applied to the next regex match...
input, if no argument is passed to exec and test use this value instead.
multiline , a boolean specifying whether string used for next match should be treated as single or multiline (equivalent to the m attribute)
1 | var a = /\b[a-z]{10,}\b/i; |
3 | RegExp.input=document.body.innerHTML; |
3) Each instance stores additional properties
source, the full source of the regex expression
global, search for all matches (the expression's g attribute is present)
ignoreCase, search ignore's case (the expression's i attribute is present)
lastIndex, index to begin the next search
(lastIndex is writeable, the other three properties are not)
The RegExp prototype also defines 3 methods:-
test
Was the match succesful? (see example above)
exec
When a match is found it returns an array of results where element 0 is the matched text and elements 1 to n represent the matched groups in sequence (equivalent to the RegExp.$n values). If the expression includes the global(g) attribute, the lastIndex property is updated after each call so that repeated calls to exec will loop through each match in the string.
Here's a method to return the first n cards from the "pack", such
that their total value does not exceed 21. Notice we define an optional
group 2 to match the numeric value of cards with non numeric names (e.g
King)
01 | var expr = /\b([^@\(]+)\(?(\d*)\)?@([^\s]+)\s?/g |
02 | <pre> var theString = '3@Clubs King(10)@Hearts 3@Spades 5@Diamonds 7@Clubs 2@Hearts 9@Spades Jack(10)@Clubs 4@Diamonds 9@Hearts' ; |
03 | var result = [], total=0, matching = true ; |
06 | var matching = expr.exec(theString); |
07 | var value = parseInt(RegExp.$2 ? RegExp.$2 : RegExp.$1); |
08 | if (!matching || (total += value)>21) { |
11 | alert( '&' + RegExp.$1); |
12 | result.push(RegExp.$1 + " of " + RegExp.$3); |
compile
Edit this RegExp instance. If you're neurotic about the overhead of
creating a new RegExp instance everytime then this is for you. Enough
said.
The String methods
Three string methods accept regular expressions as arguments. They
differ from the RegExp methods in that they ignore RegExp's last index
property (more accurately they set it to zero) and if the pattern is
global they return all matches in one pass, rather than one match for
each call. RegExp static properties (e.g. RegExp.$1) are set with each
call.
match
Returns the array of pattern matches in a string. Unless the pattern is global the array length will be 0 or 1
1 | var a = /(-[\d*\.\d*]{2,})|(-\d+)/g |
3 | "74 -5.6 9 -.5 -2 49" .match(a); |
1 | var queryExpr = new RegExp(/\?/); |
2 | var getQueryString = function (url) { |
4 | return RegExp.rightContext; |
split
Converts to array according to the supplied delimiter Optionally takes a regular expression as delimiter
1 | var names = "Smith%20O'Shea%20Cameron%44Brown" .split(/[^a-z\']+/gi); |
Nick Fitzgerald points out that IE is out on a limb when it comes to splitting on grouped expressions
1 | var time = "Two o'clock PM" .split(/(o'clock)/); |
replace
Replaces argument 1 with argument 2. Argument 1 can be a regular
expression and if its a global pattern, all matches will be replaced.
Additionally replace comes with two little used but very nice features.
First, you can use $1...$n in the second argument (representing 1...n matched groups)
1 | var a = "Smith, Bob; Raman, Ravi; Jones, Mary" ; |
2 | a.replace(/([\w]+), ([\w]+)/g, "$2 $1" ); |
4 | var a = "California, San Francisco, O'Rourke, Gerry" ; |
5 | a.replace(/([\w '\s]+), ([\w' \s]+), ([\w '\s]+), ([\w' \s]+)/, "$4 $3 lives in $2, $1" ); |
Second, you can also use a function as the second argument. This
function will get passed the entire match followed by each matched
group ($1...$n) as arguments.
1 | var chars = "72 101 108 108 111 87 111 114 108 100 33" ; |
2 | chars.replace(/(\d+)(\s?)/gi, function (all,$1){ return String.fromCharCode($1)}); |
|