Untitled Document

搜尋字串時所用的regular expression方面用處更大了，筆者幾乎天天離不了它：

Regular Expression used in Ed Line Editor

．

代表任意字元

＊

代表前面的字元出現任意多次（包括零次）

代表一行字串的開頭

代表一行字串的結尾

[...]

代表中括弧內的任一個字元都是待搜字元

[abcd] # 代表a或b或c或d 都是待搜字元

[a-d] # 代表a或b或c或d 都是待搜字元

[0-9] # 代表 [0123456789] 都是待搜字元

[0-9a-fA-F] # 代表 [0123456789abcdefABCDEF] 都是待搜字元

[-abcd] # '-'、'a'、'b'、'c'、'd'都是待搜字元

[]abcd] # ']'、'a'、'b'、'c'、'd'都是待搜字元

[]abcd-] # ']'、'-' 'a'、'b'、'c'、'd'都是待搜字元

[^a]

代表不是a的任意字元

[^a-d]

代表不是a或b或c或d的任意字元

\{n,m\}

前面的字元重複至少n次，至多m次

\{n\} 前面的字元重複正好n次

\{n,\} 前面的字元重複至少n次

Escape (將後面的特殊字元取消特殊意義)，例外: '\{', '\}', '$', '$', '\<', '\>', '\b', '\B', '\w', '\W', '\`', '\'', '\+', 以及 '\?'.

將夾在 $ 及 $ 中的字串儲存以備後面重複使用 (Back Reference)

\＋

代表前面的字元出現一次或一次以上

代表前面的字元出現零次或一次

一個英文字裡的字元 (matches a character within a word)

非一個英文字裡的字元 (matches a character which is not within a word)

一個英文字的開頭 (matches the beginning of a word)

一個英文字的尾端 (matches the end of a word)

一個英文字的邊界 (matches a word boundary)

非一個英文字的邊界 (matches characters which are not a word boundary)

整個輸入的前邊界 (matches the beginning of the whole input)

整個輸入的後邊界 (matches the end of the whole input)

[:class:]

代表中括弧內的類別中任一個字元都是待搜字元
類別包括 [:alpha:], [:upper:], [:lower:], [:alnum:], [:blank:], [:space:], [:digit:], [:xdigit:], [:cntrl:], [:print:], [:graph:], [:punct:],

類別 (Class)	符合的字元 (Matching characters)
[:digit:]	數字 Numeric characters
[:xdigit:]	16進位數字 Hexadecimal digits
[:alnum:]	英數字 Alphanumeric characters
[:alpha:]	英文字母 Alphabetic characters
[:lower:]	小寫英文字母 Lowercase characters
[:upper:]	大寫英文字母 Uppercase characters
[:cntrl:]	標點符號 Control characters
[:print:]	可印字元 Printable characters
[:punct:]	標點符號 Punctuation characters
[:space:]	空白、\t (Tab), \r (Return), \f (Form Feed), \n (New Line) 等控制字元 Whitespace characters
[:blank:]	空白以及 \t (Tab) Space and tab characters
[:graph:]	除[:space:]及[:cntrl:]以外之所有可視字元 Nonspace characters

POSIX Basic and Extended Regular Expression (BRE and ERE)

Character	BRE/ ERE	Meaning in a pattern
字元	BRE/ ERE	意義
＋	ERE	代表前面的字元出現一次或一次以上 (#KRE 不支援 )
?	ERE	代表前面的字元出現零次或一次
\	Both	Usually, turn off the special meaning of the following character. Occasionally, enable a special meaning for the following character, such as for $...$ and \{...\}.
.	Both	Match any single character except NUL. Individual programs may also disallow matching newline.
*	Both	Match any number (or none) of the single character that immediately precedes it. For EREs, the preceding character can instead be a regular expression. For example, since . (dot) means any character, .* means "match any number of any character." For BREs, * is not special if it's the first character of a regular expression.
*	Both	Match any number (or none) of the single character that immediately precedes it. For EREs, the preceding character can instead be a regular expression. For example, since . (dot) means any character, .* means "match any number of any character." For BREs, * is not special if it's the first character of a regular expression.
^	Both	Match the following regular expression at the beginning of the line or string. BRE: special only at the beginning of a regular expression. ERE: special everywhere.
$	Both	Match the preceding regular expression at the end of the line or string. BRE: special only at the end of a regular expression. ERE: special everywhere.
[...]	Both	Termed a bracket expression, this matches any one of the enclosed characters. A hyphen (-) indicates a range of consecutive characters. (Caution: ranges are locale-sensitive, and thus not portable.) A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list (i.e., literally). Bracket expressions may contain collating symbols, equivalence classes, and character classes.
\{n,m\}	BRE	Termed an interval expression, this matches a range of occurrences of the single character that immediately precedes it. \{n\} matches exactly n occurrences, \{n,\} matches at least n occurrences, and \{n,m\} matches any number of occurrences between n and m. n and m must be between 0 and RE_DUP_MAX (minimum value: 255), inclusive. "exactly five occurrences of a" and "between 10 and 42 instances of q" are written a\{5\} and q\{10,42\}, respectively.
{n,m}	ERE	Just like the BRE \{n,m\} earlier, but without the backslashes in front of the braces. {n}, {n,}, {n,m}, a{5, q{10,42}
	BRE	Save the pattern enclosed between $ and $ in a special holding space. Up to nine subpatterns can be saved on a single pattern. The text matched by the subpatterns can be reused later in the same pattern, by the escape sequences \1 to \9. For example, $ab$.*\1 matches two occurrences of ab, with any number of characters in between.
( )	ERE	Apply a match to the enclosed group of regular expressions.
\n	BRE	Replay the nth subpattern enclosed in $ and $ into the pattern at this point. n is a number from 1 to 9, with 1 starting on the left.
+	ERE	Match one or more instances of the preceding regular expression.
?	ERE	Match zero or one instances of the preceding regular expression.
\|	ERE	Match the regular expression specified before or after.

Perl Extended Regular Expression

\r	Carridge Return
\t	Horizontal Tab
\f	Form Feed
\n	New Line
\N	not \n
\s	空白、\t, \r, \f, \n
\S	not \s
\w	a-z, A-Z, 0-9, 以及 '_' (underscore).
\W	not \w
\d	0-9
\D	not d
\b	英文字的邊界 (word boundry)

Web Page Copyright: 亞洲大學資訊電機學院 連耀南 yaonanlien@asia.edu.tw string.htm, Sun Sep 8 10:52:47 CST 2024

Web Page Copyright: 亞洲大學資訊電機學院連耀南 yaonanlien@asia.edu.tw string.htm, Sun Sep 8 10:52:47 CST 2024