on Regular Expressions

This would change the string into ERROR, 1, ERROR, 003, 002,. 04 ... we more generally use the string version. “abc” rather than /abc/. ◇ The basics are the ...
177KB taille 3 téléchargements 324 vues
World Wide Web Programming 17 - More on Regular Expressions

Reusing groups of Characters ‹ ‹

It is possible to reuse a group of characters in a regular expression For example, if you want to check than in a set of values, there are not two of the same after the other

• Basically 009, 007, 001, 002, 004, 003 would be accepted but 007, 007, 001, 002, 002, 003 would have 2 errors

‹ ‹

‹

Any pattern written between parenthesis can be referenced by a \ followed by the position of the pattern For example /(\d+), \1/ would match a number of one or more digits followed by a , a space and the same number again Ex:

• var myString = “007, 007, 1, 02, 02, 003, 002, 04”; var myRegExp = /(\d+), \1/g; myString = myString.replace(myRegExp, “ERROR”); • This would change the string into ERROR, 1, ERROR, 003, 002, 04

Using the “or” character You can try to pattern 2 different sets of patterns ‹ For example, to replace ‘ by “ but only when it surrounds a word you would use ‹

• /\B’|’\B/g • This would try to find a non-word break followed by a ‘ OR a ‘ followed by a nonword break

Another example of OR ‹ ‹ ‹

Try to match tags and words into an array We’ll use the myString.match(myRegExp) function which outputs an array The regular expression should try to find first a tag • • • •

‹

Starts with < Has at least a character other than > \r or \n Ends with > This gives us \r\n]+>

It has to find either a tag OR a word, represented as a non tag (no < > \r or \n) • [^\r\n]+

‹

The total regular expression becomes • /\r\n]+>|[^\r\n]+/g

Regular Expressions in PHP ‹

‹

It is more or less like in JavaScript though we more generally use the string version “abc” rather than /abc/ The basics are the same, we can do • • • • •

‹

“xyz” “abc|xyz” “[xyz]” “[0-9]” “[^xyz]” “x+” “x*” “x?” “ab{3}” “ab{3,}” “ab{3,5}” “x(yz)*” “.” “^ab” “ab$”

You can also use “[[:alnum:]]” “[[:digit:]]” and “[[:alpha:]]”

Regular expressions examples ‹

To find a currency (0; 1000; 10,000; 10000.00; 10,000.00; …) • • • •

‹

We have either 0 or a number not starting with 0 Can have up to 2 digits after the decimal point Can be negative Can have commas…

We start with the string 0 or another number not starting with 0 • “^0$” • Or “^[1-9][0-9]*$” • Combined: “^(0|[1-9][0-9]*)$”

‹

We add the - support

• “^(0|-?[1-9][0-9]*)$”

‹

To find an optional decimal value

• “(\.[0-9]{1,2})?” • Which gives us “^(0|-?[1-9][0-9]*)(\.[0-9]{1,2})?$”

‹

To add commas support

• “^(0|-?[1-9](([0-9]*)|([0-9]{1,2}(,[0-9]{3})*)))(\.[0-9]{1,2})?$”

Validating an email address ‹

‹

We want to match something of the type [email protected] We have a name and a domain deparated by @ • User can be “[-a-zA-Z0-9._]+” • The domain name can also have a whole set of subdomains ‹

‹

“(\.[-a-zA-Z0-9]+)”

It becomes • “^[-a-zA-Z0-9._]+@[-a-zA-Z0-9.]+(\.[-a-zAZ0-9]+)+$”

Using PHP functions ‹ ‹

‹

int ereg(string pattern, string str, [,array regs]) For example to replace something in the format MM-DD-YYYY into DD-MM-YYYY you can do if (ereg(“([0-9]{1,2})-([0-9]{1,2})-([0-9]{4})”, $date, $regs))

echo $regs[2].”-”.$regs[1].”-”.$regs[3];

else

echo “Invalid date format:”.$date;

‹

You can also use eregi(…) which ignores the character case

More PHP functions ‹

string ereg_replace(string pattern, string replacement, string str) • Replace a pattern by replacement in str

‹ ‹

string eregi_replace(…) array split(string pattern, string str, [, int limit]) • •

‹ ‹

Return an array with the matched elements limit can be set to decide the maximum number of elements in the array

array spliti(…) string sql_regcase(string str)

Pleasing Perl lovers… ‹ ‹ ‹ ‹

PHP has a set of PCRE functions You can use the same regexps than with JavaScript /php/ And you can use any of the following modifiers

• i for case-insesitive • x to ignore whitespace data in the pattern and also anything written between # and an end of line, allowing to write patterns like ‹

/

/xi

\b web \b

#begin pattern #Find a word boundary #”web” is to be matched #Followed by another wb

• e only used by preg_replace() allows normal use of \\

‹

You can also use any of the following characters • \d \D \s \S \w \W \b \B \A \Z \z

‹

For more details on PCRE… http://www.pcre.org

Example of the e attribute ‹

‹

$html_string = “Bold Text and underlined text”; $new_html = preg_replace(“/(]*>)/e”, “’\\1’.strtoupper(‘\\2’).’\\3’”, $html_string);

The new string would hold “Bold Text and underlined text”

PCRE related PHP functions (1/4) ‹

int preg_match(string pattern, string str [, array match]) • $match[0] would hold the text that matched the full pattern • $match[1] would hold the text that matched the first captured parenthesized subpattern • so on

PCRE related PHP functions (2/4) ‹

‹

int preg_march_all(string pattern, string str, array matches [, int order]) Order can be • PREG_PATTERN_ORDER ‹

In this case $matches[0] is the an array with full pattern matches, $matches[1] the array of strings matched by the first parenthesized sub-pattern, and so on

• PREG_SET_ORDER ‹

‹

In this case $matches[0] is an array of first set of matches, $matches[1] array of second set of matches and so on…

I recommend you use the PREG_PATTERN_ORDER

PCRE related PHP functions (3/4) string preg_replace(string pattern, string replacement, string str [, int limit]) ‹ If limit is omitted or equal to -1 all occurrences are replaced ‹ If you want to use the \\0, \\1, … You can access maximum 9 substring (1-9), 0 being used for the whole pattern ‹

PCRE related PHP functions (4/4) ‹

array preg_split(string pattern, string subject [, int limit [, int flags]]) • flags can be PREG_SPLIT_NO_EMPTY which only splits non empty pieces…

‹

string preg_quote(string s [, string selimiter]) • Puts a backslash in front of every character which is part of the regular expression syntax (.\+*?[^]$(){}=!|:) • If you add a delimiter, it will also be escaped