搜尋字串時所用的regular expression方面用處更大了,
筆者幾乎天天離不了它:
|
Regular Expression used in Ed Line Editor
|
. | 代表任意字元
*
| 代表前面的字元出現任意多次(包括零次)
| ^
| 代表一行字串的開頭
| $
| 代表一行字串的結尾
| [...]
| 代表中括弧內的任一個字元都是待搜字元
| [abcd] # 代表a或b或c或d 都是待搜字元 [a-d] # 代表a或b或c或d 都是待搜字元 [0-9] # 代表 [0123456789] 都是待搜字元 [0-9a-fA-F] # 代表 [0123456789abcdefABCDEF] 都是待搜字元 [-abcd] # '-'、'a'、'b'、'c'、'd'都是待搜字元 []abcd] # ']'、'a'、'b'、'c'、'd'都是待搜字元 []abcd-] # ']'、'-' 'a'、'b'、'c'、'd'都是待搜字元 [^a]
| 代表不是a的任意字元
| | 代表不是a或b或c或d的任意字元
| | 前面的字元重複至少n次,至多m次
| \{n\} 前面的字元重複正好n次 \{n,\} 前面的字元重複至少n次 \
| Escape (將後面的特殊字元取消特殊意義),例外:
'\{', '\}', '\(', '\)', '\<', '\>', '\b', '\B', '\w', '\W', '\`', '\'', '\+', 以及 '\?'.
| \( \)
| 將夾在 \( 及 \) 中的字串儲存以備後面重複使用 (Back Reference)
| \+
| 代表前面的字元出現一次或一次以上
| \?
| 代表前面的字元出現零次或一次
| \w
| 一個英文字裡的字元 (matches a character within a word)
| \W
| 非一個英文字裡的字元 (matches a character which is not within a word)
| \<
| 一個英文字的開頭 (matches the beginning of a word)
| \>
| 一個英文字的尾端 (matches the end of a word)
| \b
| 一個英文字的邊界 (matches a word boundary)
| \B
| 非一個英文字的邊界 (matches characters which are not a word boundary)
| \`
| 整個輸入的前邊界 (matches the beginning of the whole input)
| \'
| 整個輸入的後邊界 (matches the end of the whole input)
| [:class:]
| 代表中括弧內的類別中任一個字元都是待搜字元 | 類別包括 [:alpha:], [:upper:], [:lower:], [:alnum:], [:blank:], [:space:], [:digit:], [:xdigit:], [:cntrl:], [:print:], [:graph:], [:punct:],
|
---|
POSIX Basic and Extended Regular Expression (BRE and ERE)
|
Character | | Meaning in a pattern
字元 | | 意義
| +
| ERE
| 代表前面的字元出現一次或一次以上 (#KRE 不支援 ) | ?
| ERE
| 代表前面的字元出現零次或一次 | \
| Both
| Usually, turn off the special meaning of the following character.
Occasionally, enable a special meaning for the
following character, such as
for \(...\) and \{...\}.
| .
| Both
| Match any single character except NUL.
Individual programs may also
disallow matching newline.
| *
| Both
| Match any number (or none) of the single
character that immediately
precedes it. For EREs, the preceding character can
instead be a regular expression.
| For example, since . (dot) means any character, .* means "match any number of any character." For BREs, * is not special if it's the first character of a regular expression. *
| Both
| Match any number (or none) of the single
character that immediately
precedes it. For EREs, the preceding character can
instead be a regular expression.
| For example, since . (dot) means any character, .* means "match any number of any character." For BREs, * is not special if it's the first character of a regular expression. ^
| Both
| Match the following regular expression at the
beginning of the line
or string.
| BRE: special only at the beginning of a regular expression. ERE: special everywhere. $
| Both
| Match the preceding regular expression at the
end of the line or string.
| BRE: special only at the end of a regular expression. ERE: special everywhere. [...]
| Both
| Termed a bracket expression, this matches
any one of the enclosed characters.
| A hyphen (-) indicates a range of consecutive characters. (Caution: ranges are locale-sensitive, and thus not portable.) A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list (i.e., literally). Bracket expressions may contain collating symbols, equivalence classes, and character classes. \{n,m\}
| BRE
| Termed an interval expression, this
matches a range of
occurrences of the single character that immediately
precedes it.
| \{n\} matches exactly n occurrences, \{n,\} matches at least n occurrences, and \{n,m\} matches any number of occurrences between n and m. n and m must be between 0 and RE_DUP_MAX (minimum value: 255), inclusive. "exactly five occurrences of a" and "between 10 and 42 instances of q" are written a\{5\} and q\{10,42\}, respectively. {n,m}
| ERE
| Just like the BRE \{n,m\} earlier, but
without the backslashes in front of the braces.
| {n}, {n,}, {n,m}, a{5, q{10,42} \( \)
| BRE
| Save the pattern enclosed between \( and \)
in a special holding
space. Up to nine subpatterns can be saved on a single
pattern. The text
matched by the subpatterns can be reused later in the
same pattern, by the
escape sequences \1 to \9. For example, \(ab\).*\1
matches two occurrences of
ab, with any number of characters in between.
| ( )
| ERE
| Apply a match to the enclosed group of regular
expressions.
| \n
| BRE
| Replay the nth subpattern enclosed in \( and \)
into the pattern at
this point. n is a number from 1 to 9, with 1 starting
on the left.
| +
| ERE
| Match one or more instances of the preceding
regular expression.
| ?
| ERE
| Match zero or one instances of the preceding
regular expression.
| |
| ERE
| Match the regular expression specified before or
after.
| |
---|
Perl Extended Regular Expression
|
\r | Carridge Return
\t
| Horizontal Tab
| \f
| Form Feed
| \n
| New Line
| \N
| not \n
| \s
| 空白、\t, \r, \f, \n
| \S
| not \s
| \w
| a-z, A-Z, 0-9, 以及 '_' (underscore).
| \W
| not \w
| \d
| 0-9
| \D
| not d
| \b
| 英文字的邊界 (word boundry)
| |
---|