Introduction |
---|
我們經常要用同一指令同時處理很多檔案,
或要進行搜尋/取代某一類字串時
regular expression是威力非常強大的一種表達方式,
讓使用者很方便很精準的表達某一類字串。
|
例如:當我們想要利用ex 編輯指令將一個HTML檔案裡面的所有
表格加上底色,語法是把表格標籤 <table> 加上底色屬性,成為
<table bgcolor=white>。
我們希望不需要逐一找出所有的表格標籤編輯,
而是希望能下一道編輯指令即可達到目的。
假設而HTML檔案內有兩種表格標籤,<table>、<Table>,
如果不使用Regular Expression, ex 編輯器需要兩個指令達到目的:
是故,這個編輯指令作用為:從第一行到最後一行,將字串 "<Table" 及 "<table" 取代為 原字串加上 "bgcolor=white"。 再者,上面的編輯指令能保留表格標籤中原有的屬性定義。 如果檔案內的table 標籤有多種不同的大小寫寫法, Regexp 照樣可以應付裕如。 |
其實我們常用的檔案名稱的萬用字元(Wildcard)也是一種
Regular Expression。
|
Regular Expression (簡稱 regexp),中文譯名是百花齊放:
正規表示式、規律表達式、正規表達式、正規表示法、規則運算式、常規表示法,等等。
Unix 系統裡的很多工具都提供了 regexp
的功能,大約三四十個特殊符號,只要學會其中常用的幾個符號,
就能在搜尋字串/代換字串的任務中發揮強大的威力。
|
Regular Expression 歷史 |
---|
Regular Expression
是自動機理論中的一種基本概念,一個語言如果是被歸類為Regular
Expression,
那麼用這個語言所表達的字串都可以用有限狀態機(Finite
State Automata)或更高等級的自動機來解析。
Ken Tompson為了方便字串比對定義了一套符號系統用在編輯器QED的字串比對
及取代這兩項功能上,
這套符號系統是符合Regular
Expression的一種語言,並未有正式的命名,我們姑且名之為
KRE (Ken's Regular
Expression)。Unix上的編輯器ed、ex、sed以及 grep
都採用了KRE。由於引入了KRE,Unix
的威力大增,對於Unix 的推廣起到了推波助瀾的作用。
此後,KRE被廣泛地應用於各種Unix或類Unix系統的工具中。
後來POSIX將KRE補上幾個符號定義了BRE(Basic Regular Expression),
後來更擴充成 ERE(Extended Regular Expression)。
後來出現的perl 更將 BRE/ERE 大幅擴充(稱為Perl Compatible Regular
Expressions, PCRE),威力倍增。
時至今日,任何含有字串處理功能的程式語言如果沒有納入Regular
Expression,絕無生存的空間。
所幸KRE所用的符號通用於所有具字串處理的程式語言中,任何人只要熟悉了
KRE,在任一個程式語言中不須重新學習新的符號系統,只須學習擴充部分。
時至今日,對於 regexp 的熟悉程度,
已經成為評價軟體工程師的一個重要指標。
|
下面這個網站 https://regex101.com/ 提供了一個練習 regexp 的平台,
可以測試各種 regexp 表達式,並對使用者輸入的字串做測試。
網站也會解釋你打了甚麼規則, 所match 的字串等,是一個不錯的練習與
測試平台。
|
檔案的Regular Expression |
---|
在檔案名稱方面所用的Wildcard符號也是可歸類為 Regular Expression 語法,
|
* | 代表任意長度的字元
?
| 代表任意一個字元
| [ ]
| 括弧內的任一個字元
| |
---|
例:
|
a* | 代表所有a開頭的檔案名稱
a?
| 代表所有a開頭而長度為二的檔案名稱
| a[1-9]
| 代表所有a開頭而第二字為 1-9 中任意數字的檔案名稱
| |
---|
這些該是耳熟能詳的吧?並不盡然,請看看下面兩個指令:
|
ls a[1-9] echo a[1-9] |
---|
這兩個指令是否產生相同的結果?
|
大部份的人知道第一行指令的結果,
但卻認為第二行指令會產生下面的結果。
|
a1 a2 a3 a4 a5 a6 a7 a8 a9 |
---|
其實這兩個指令所產生的結果完全一樣,
|
在執行第二行指令時,shell 會先找到a[1-9]這個字串,
認定它是檔案名稱,先行找到該目錄中所有符合a[1-9]
這個regular expression 的檔案名稱,再叫出echo,
並將所找出來的檔案名稱傳給它,而echo則將拿到的檔案名稱
以字串方式一一印出於STDOUT。
|
因此請緊緊記住,當shell把一個字串當成檔案名稱時,
shell都會去所在目錄找出所有符合條件的檔案名稱,再展開成字串。
|
如果要印出 a1 a2 a3 a4 a5 a6 a7 a8 a9
這樣的字串,應該使用 Range Generator 如下:
|
字串的Regular Expression |
---|
搜尋字串時所用的regular expression方面用處更大了,
筆者幾乎天天離不了它:
|
Regular Expression used in Ed Line Editor
|
. | 代表任意字元
*
| 代表前面的字元出現任意多次(包括零次)
| ^
| 代表一行字串的開頭
| $
| 代表一行字串的結尾
| [...]
| 代表中括弧內的任一個字元都是待搜字元
| [abcd] # 代表a或b或c或d 都是待搜字元 [a-d] # 代表a或b或c或d 都是待搜字元 [0-9] # 代表 [0123456789] 都是待搜字元 [0-9a-fA-F] # 代表 [0123456789abcdefABCDEF] 都是待搜字元 [-abcd] # '-'、'a'、'b'、'c'、'd'都是待搜字元 []abcd] # ']'、'a'、'b'、'c'、'd'都是待搜字元 []abcd-] # ']'、'-' 'a'、'b'、'c'、'd'都是待搜字元 [^a]
| 代表不是a的任意字元
| | 代表不是a或b或c或d的任意字元
| | 前面的字元重複至少n次,至多m次
| \{n\} 前面的字元重複正好n次 \{n,\} 前面的字元重複至少n次 \
| Escape (將後面的特殊字元取消特殊意義),例外:
'\{', '\}', '\(', '\)', '\<', '\>', '\b', '\B', '\w', '\W', '\`', '\'', '\+', 以及 '\?'.
| \( \)
| 將夾在 \( 及 \) 中的字串儲存以備後面重複使用 (Back Reference)
| \+
| 代表前面的字元出現一次或一次以上
| \?
| 代表前面的字元出現零次或一次
| \w
| 一個英文字裡的字元 (matches a character within a word)
| \W
| 非一個英文字裡的字元 (matches a character which is not within a word)
| \<
| 一個英文字的開頭 (matches the beginning of a word)
| \>
| 一個英文字的尾端 (matches the end of a word)
| \b
| 一個英文字的邊界 (matches a word boundary)
| \B
| 非一個英文字的邊界 (matches characters which are not a word boundary)
| \`
| 整個輸入的前邊界 (matches the beginning of the whole input)
| \'
| 整個輸入的後邊界 (matches the end of the whole input)
| [:class:]
| 代表中括弧內的類別中任一個字元都是待搜字元 | 類別包括 [:alpha:], [:upper:], [:lower:], [:alnum:], [:blank:], [:space:], [:digit:], [:xdigit:], [:cntrl:], [:print:], [:graph:], [:punct:],
|
---|
POSIX Basic and Extended Regular Expression (BRE and ERE)
|
Character | | Meaning in a pattern
字元 | | 意義
| +
| ERE
| 代表前面的字元出現一次或一次以上 (#KRE 不支援 ) | ?
| ERE
| 代表前面的字元出現零次或一次 | \
| Both
| Usually, turn off the special meaning of the following character.
Occasionally, enable a special meaning for the
following character, such as
for \(...\) and \{...\}.
| .
| Both
| Match any single character except NUL.
Individual programs may also
disallow matching newline.
| *
| Both
| Match any number (or none) of the single
character that immediately
precedes it. For EREs, the preceding character can
instead be a regular expression.
| For example, since . (dot) means any character, .* means "match any number of any character." For BREs, * is not special if it's the first character of a regular expression. *
| Both
| Match any number (or none) of the single
character that immediately
precedes it. For EREs, the preceding character can
instead be a regular expression.
| For example, since . (dot) means any character, .* means "match any number of any character." For BREs, * is not special if it's the first character of a regular expression. ^
| Both
| Match the following regular expression at the
beginning of the line
or string.
| BRE: special only at the beginning of a regular expression. ERE: special everywhere. $
| Both
| Match the preceding regular expression at the
end of the line or string.
| BRE: special only at the end of a regular expression. ERE: special everywhere. [...]
| Both
| Termed a bracket expression, this matches
any one of the enclosed characters.
| A hyphen (-) indicates a range of consecutive characters. (Caution: ranges are locale-sensitive, and thus not portable.) A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list (i.e., literally). Bracket expressions may contain collating symbols, equivalence classes, and character classes. \{n,m\}
| BRE
| Termed an interval expression, this
matches a range of
occurrences of the single character that immediately
precedes it.
| \{n\} matches exactly n occurrences, \{n,\} matches at least n occurrences, and \{n,m\} matches any number of occurrences between n and m. n and m must be between 0 and RE_DUP_MAX (minimum value: 255), inclusive. "exactly five occurrences of a" and "between 10 and 42 instances of q" are written a\{5\} and q\{10,42\}, respectively. {n,m}
| ERE
| Just like the BRE \{n,m\} earlier, but
without the backslashes in front of the braces.
| {n}, {n,}, {n,m}, a{5, q{10,42} \( \)
| BRE
| Save the pattern enclosed between \( and \)
in a special holding
space. Up to nine subpatterns can be saved on a single
pattern. The text
matched by the subpatterns can be reused later in the
same pattern, by the
escape sequences \1 to \9. For example, \(ab\).*\1
matches two occurrences of
ab, with any number of characters in between.
| ( )
| ERE
| Apply a match to the enclosed group of regular
expressions.
| \n
| BRE
| Replay the nth subpattern enclosed in \( and \)
into the pattern at
this point. n is a number from 1 to 9, with 1 starting
on the left.
| +
| ERE
| Match one or more instances of the preceding
regular expression.
| ?
| ERE
| Match zero or one instances of the preceding
regular expression.
| |
| ERE
| Match the regular expression specified before or
after.
| |
---|
Perl Extended Regular Expression
|
\r | Carridge Return
\t
| Horizontal Tab
| \f
| Form Feed
| \n
| New Line
| \N
| not \n
| \s
| 空白、\t, \r, \f, \n
| \S
| not \s
| \w
| a-z, A-Z, 0-9, 以及 '_' (underscore).
| \W
| not \w
| \d
| 0-9
| \D
| not d
| \b
| 英文字的邊界 (word boundry)
| |
---|
Simple Regular Expression Examples |
---|
Script ID | Script | 說明
|
|
|
|
|
|
|
|
|
|
| |
---|
高階的 Regular Expression |
---|
在 shell script 中利用 regexp 處理字串,通常只需要用到簡單的符號及其組合,而KRE
在大部分的情況是足夠了。而且處理字串時(例如取代或刪除)regexp 必須非常精準,
以免取代或刪除了不該更動的地方。例如一個 sed 的取代指令 's/[Tt]he/THE/'
就可能誤將很多其他的英文字 改掉了,例如there 被改成
THEre。我們強烈建議不要使用太複雜的 regexp 於取代或刪除這種動作中。
|
Regular Expression 在大數據的應用
|
現在的資訊世界越來越仰賴大數據的運用,而在海量的資訊中撈取使用者有興趣的資料,
那就需要精準的搜尋了,如果搜尋條件太緊的話,會漏掉需要的資訊,
反之,如果搜尋條件太鬆的話,會撈出太多無關的資訊,淹沒了所需要的資訊。因此,
精準的搜尋在大數據相關應用中,非常重要。以電話號碼為例,如果要從一堆檔案中,
撈出所有的電話號碼,那是非常頭疼的一項任務,因為電話號碼有太多的表達形式,
舉例而言,至少有以下幾種常見的格式:
|
運用 Regular Expression 於輸入資料的格式檢查
|
另一個常用到 Regexp 的場合是,網頁資訊系統中對於輸入資料的格式檢查,例如
email,日期,身分證字號、密碼等。這些資料每一個人都不同,但必須符合一定格式,那就必定要靠
regexp 精確的描述合規的格式,才能輕鬆的解決。我們以email 格式為例:
|
Email Format Validation Condition
|
(1) | 中間一定要出現一個 @
(2)
| 必須以一個以上的文字或數字開頭
| (3)
| @ 之前可以出現 1 個以上的文字、數字與「-」的組合,例如 -abc-
| (4)
| @ 之前可以出現 1 個以上的文字、數字與「.」的組合,例如 .abc.
| (5)
| @ 之前以上兩項以 or 的關係出現,並且出現 0 次以上
| (6)
| @ 之後出現一個以上的大小寫英文及數字的組合
| (7)
| @ 之後只能出現「.」或是「-」,但這兩個字元不能連續時出現
| (8)
| @ 之後出現 0 個以上的「.」或是「-」配上大小寫英文及數字的組合
| (9)
| @ 之後出現 1 個以上的「.」配上大小寫英文及數字的組合,結尾需為大小寫英文
| |
---|
PCRE for email Format Validation
|
SSN (Social Security Number) Format Validation Condition
|
正好9個數字 It should have 9 digits. 分成三段,用'-'隔開 | It should be divided into 3 parts by hyphen (-). 第一段為3個數字,排除000, 666, 900-999等數字 | The first part should have 3 digits and should not be 000, 666, or between 900 and 999. 第二段為01-99之2位數 | The second part should have 2 digits and it should be from 01 to 99. 第三 段為0001-9999之4位數 | The third part should have 4 digits and it should be from 0001 to 9999. |
---|
PCRE for SSN Format Validation
|
POXIS Regular Expresion |
---|
POSIX BRE (Basic RE) and ERE (Extended RE) metacharacters
|
Bracket Regular Expression |
---|
Character classes
|
represent classes of
characters, such as digits, lower- and uppercase
letters, punctuation,
whitespace, and so on.
|
They are written by enclosing
the name of the class in
[: and :].
|
The pre-POSIX range
expressions for decimal and hexadecimal digits can
(and should) be expressed
portably, by using character classes: [[:digit:]] and
[[:xdigit:]].
|
POSIX character classes
|
類別 (Class) | 符合的字元 (Matching
characters)
[:digit:]
| 數字 | Numeric characters [:xdigit:]
| 16進位數字 | Hexadecimal digits [:alnum:]
| 英數字 | Alphanumeric characters [:alpha:]
| 英文字母 | Alphabetic characters [:lower:]
| 小寫英文字母 | Lowercase characters [:upper:]
| 大寫英文字母 | Uppercase characters [:cntrl:]
| 標點符號 | Control characters [:print:]
| 可印字元 | Printable characters [:punct:]
| 標點符號 | Punctuation characters [:space:]
| 空白、空格以及 \r, \f
等控制字元 | Whitespace characters [:blank:]
| 空白以及空格 | Space and tab characters [:graph:]
| 除[:space:]及[:cntrl:]以外之所有可視字元 | Nonspace characters |
---|
Bracket Regular Expression |
---|
Collating
|
the act of giving an ordering to some group or set of items.
|
A POSIX collating element consists of the name of the
element in the current
locale, enclosed by [. and .].
|
For example, in Czech and Spanish, the two characters
ch are kept
together and are treated as a single unit for
comparison purposes
|
Thus, [ab[.ch.]de] matches any of the characters a, b, d, or
e, or the pair ch.
It does not match a standalone c or h character. |
Bracket Regular Expression |
---|
Equivalence class
|
used to represent different
characters that should be treated the same when matching.
|
Equivalence classes enclose the name of the
class between [= and =].
|
For example, in a French locale, there might be an
[=e=] equivalence class. If it exists, then the
regular expression
[a[=e=]iouy] would match all the lowercase English
vowels, as well as the letters e`, e', and so on.
|
Bracket Regular Expression |
---|
Collating elements, equivalence classes, and character
classes are only
recognized inside the square brackets of a bracket
expression. Writing a
standalone regular expression such as [:alpha:]
matches the characters a, l,
p, h, and :. The correct way to write it is
[[:alpha:]].
|
Within bracket expressions, all other metacharacters
lose their special
meanings. Thus, [*\.] matches a literal asterisk, a
literal backslash, or a
literal period. To get a ] into the set, place it
first in the list: [ ]*\.]
adds the ] to the list. To get a minus character into
the set, place it first
in the list: [-*\.]. If you need both a right bracket
and a minus, make the
right bracket the first character, and make the minus
the last one in the
list: [ ]*\.-].
|
Finally, POSIX explicitly states that the NUL
character (numeric value zero)
need not be matchable. This character is used in the C
language to indicate
the end of a string, and the POSIX standard wanted to
make it straightforward
to implement its features using regular C strings. In
addition, individual
utilities may disallow matching of the newline
character by the . (dot)
metacharacter or by bracket expressions.
|
Regular Expression Examples |
---|
clinton | The seven letters clinton, anywhere on a line
^clinton
| The seven letters clinton, at the beginning
of a line
| clinton$
| The seven letters clinton, at the end of a
line
| ^clinton$
| A line containing exactly the seven letters
clinton, and nothing
else
| [Cc]linton
| Either the seven letters Clinton, or the
seven letters clinton,
anywhere on a line
cli.ton | The three letters cli, any character, and the
three letters ton, anywhere on a line
cli.*ton | The three letters cli, any sequence of zero
or more characters, and
the three letters ton, anywhere on a line (e.g.,
cliton, clinton, cliBILLton, and so on)
| |
---|
Examples: {} |
---|
Pattern | Matches
\{n\}
| Exactly n occurrences of the preceding regular
expression
| \{n,\}
| At least n occurrences of the preceding regular
expression
| \{n,m\}
| Between n and m occurrences of the preceding
regular expression
| |
---|
Examples: Anchoring text matches |
---|
Text to be matched: abcABCdefDEF
|
Pattern | Matches? | Text matched/Reason match fails
ABC
| Yes
| abcABCdefDEF
| ^ABC
| No
| Match is restricted to beginning of string
| def
| Yes
| abcABCdefDEF
| def$
| No
| Match is restricted to end of string
| [[:upper:]]\{3\}
| Yes
| Characters 4, 5, and 6, in the
middle: abcABCdefDEF
| [[:upper:]]\{3\}$
| Yes
| Characters 10, 11, and 12, at
the end: abcDEFdefDEF
| ^[[:alpha:]]\{3\}
| Yes
| Characters 1, 2, and 3, at the
beginning: abcABCdefDEF
| |
---|
^$
|
match to empty lines or strings
|
#filter out all empty lines |
---|
Back References |
---|
Back References | match whatever an earlier part of the regular expression matched |
---|
Back Reference exists in BRE only, not in ERE.
|
Step 1 | to enclose a subexpression in \( and \).
There may be up to nine enclosed subexpressions within a single pattern, and they may be nested. Step 2
| to use \digit, where digit is a
number between 1 and 9, in a
later part of the same pattern.
| Its meaning there is "match whatever was matched by the nth earlier parenthesized subexpression." |
---|
Examples
|
BRE operator precedence |
---|
BRE operator precedence from highest to lowest
|
Operator | Meaning
[. .] [= =] [: :]
| Bracket symbols for character
collation
| \metacharacter
| Escaped metacharacters
| [ ]
| Bracket expressions
| \( \) \digit
| subexpressions and backreferences
| * \{ \}
| Repetition of the preceding
single-character regular expression
| no symbol
| Concatenation
| ^ $
| Anchors
| |
---|
Extended Regular Expression |
---|
EREs, as the name implies, have more capabilities than
do basic regular
expressions. Many of the metacharacters and
capabilities are identical.
However, some of the metacharacters that look similar
to their BRE counterparts have different meanings.
|
Matching single characters
|
EREs are essentially the same as
BREs.
|
Exceptions
|
in awk, \ is special inside bracket
expressions.
Thus, to match a left bracket, dash, right bracket, or backslash, you could use [\[\-\]\]. |
Backreferences don't exist
|
Parentheses are special in EREs, but
serve a different purpose than they do in BREs
In an ERE, \( and \) match literal left and right
parentheses.
|
Matching multiple regular expressions with one expression
|
EREs have the most notable differences from BREs in
the area of matching
multiple characters.
|
The * does work the same as in BREs.
|
An exception is that the meaning of a * as the
first character of an
ERE is "undefined," whereas in a BRE it means "match a
literal *."
|
Interval expressions are also available in EREs;
|
however, they are written
using plain braces, not braces preceded by
backslashes.
|
"exactly five occurrences of a" and
"between 10 and 42 instances
of q" are written a{5} and q{10,42}, respectively.
|
Use \{ and \} to match literal brace characters.
|
? and +
|
? | Match zero or one of the preceding regular expression
? meaning "optional." example, ab?c matches both ac and abc, but nothing else. +
| Match one or more of the preceding regular expression
| similar to the * metacharacter, except that at least one occurrence of text matching the preceding regular expression must be present. Thus, ab+c matches abc, abbc, abbbc, and so on, but does not match ac. ab+c is same as abb*c |
---|
Alternation and Grouping
|
(why)+ | matches one or more occurrences of the word why.
[Tt]he (CPU|computer) is
| matches sentences
using either CPU or
computer in between The (or the) and is.
| (read|write)+
| matches one or more
occurrences of either
of the words read or write
| | same as above but
allow zero or more intervening whitespace between words
| matches multiple successive occurrences of either read or write, possibly separated by whitespace characters. ((read|write)[[:space:]]+)+
| same as above but
allow one or more intervening whitespace between words
| ^abcd|efgh$
| match abcd
at the beginning of the
string, or match efgh at the end of the string
| ^(abcd|efgh)$
| match a string containing
exactly abcd or exactly efgh
| |
---|
Anchoring text matches
|
The ^ and $ have the same meaning as in BREs
|
In EREs, ^ and $ are always metacharacters.
Thus, regular expressions such as ab^cd and ef$gh are
valid, but cannot match anything,
|
ERE operator precedence
|
Operator precedence applies to EREs as it does to BREs.
|
ERE operator precedence from highest to lowest
|
Operator | Meaning
[. .] [= =] [: :]
| Bracket symbols for character
collation
| \metacharacter
| Escaped metacharacters
| [ ]
| Bracket expressions
| ( )
| Grouping
| * + ? { }
| Repetition of the preceding regular
expression
| no symbol
| Concatenation
| ^ $
| Anchors
| |
| Alternation
| |
---|
GNU Extensions |
---|
Operator | Meaning
\w
| Matches any word-constituent character. Equivalent
to [[:alnum:]_].
| \W
| Matches any nonword-constituent character.
Equivalent to [^[:alnum:]_].
| \< \>
| Matches the beginning and end of a word, as
described previously.
| \b
| Matches the null string found at either the
beginning or the end of a
word. This is a generalization of the \< and \>
operators. Note: Because awk
uses \b to represent the backspace character, GNU awk
(gawk) uses \y.
| \B
| Matches the null string between two
word-constituent characters.
| \' \`
| Matches the beginning and end of an emacs
buffer, respectively. GNU
programs (besides emacs) generally treat these as
being equivalent to ^ and $.
| |
---|
Which Programs Use Which Regular Expressions? |
---|
Unix programs and their regular expression type
|
Type | grep | sed | ed | ex/vi | more | egrep | awk | lex
BRE
| Y
| Y
| Y
| Y
| Y
|
|
| | ERE
|
|
|
|
|
| Y
| Y
| Y
| \< \>
| Y
| Y
| Y
| Y
| Y
|
|
| | |
---|
Email Format Check |
---|
(1) | 中間一定要出現一個 @
(2)
| 必須以一個以上的文字或數字開頭
| (3)
| @ 之前可以出現 1 個以上的文字、數字與「-」的組合,例如 -abc-
| (4)
| @ 之前可以出現 1 個以上的文字、數字與「.」的組合,例如 .abc.
| (5)
| @ 之前以上兩項以 or 的關係出現,並且出現 0 次以上
| (6)
| @ 之後出現一個以上的大小寫英文及數字的組合
| (7)
| @ 之後只能出現「.」或是「-」,但這兩個字元不能連續時出現
| (8)
| @ 之後出現 0 個以上的「.」或是「-」配上大小寫英文及數字的組合
| (9)
| @ 之後出現 1 個以上的「.」配上大小寫英文及數字的組合,結尾需為大小寫英文
| |
---|
經驗 |
---|
注意前面所提的檔名稱及這裡所提的字串的regular expression
雖然有相同的符號,但在不同的地方代表的意義是不同的,
|
很多人經常將兩者混在一起。
|
此外,另一個容易混淆的地方是:
|
不同的些軟體,所使用的regular expression也許有些差異,
例如 BRE 或 ERE, 甚至是POSIX 標準出來之前所使用的
Regular Expression.
|
以上所提的regular expression然只有幾個,但使用起來卻是
一大幫手,
所提的例子中,到處可以見到它們的蹤影,我們不在這裡多作說明,
而在各個例子中才說明。
|
原因是很多人看過之後,沒有實際運用,不容易體會到它們的好處。
|