- POSIX style, end of word:
- POSIX style, start of word:
- POSIX style, word boundary:
- SVR4/GNU, end of word:
- SVR4/GNU, start of word:
- Perl/GNU, word boundary:
- Tcl, end of word:
- Tcl, start of word:
- Tcl, word boundary:
- Portable ERE, start of word:
- Portable ERE, end of word:
- POSIX chapter on regular expressions
- Perl regular expression documentation
- Tcl re_syntax manual page
- GNU grep backslash expressions
- BSD re_format
- More reading
Find patterns at the beginning or end of a word
Examine the following strings:
- the regular expression
barwill match all four strings,
\bbar\bwill only match the 2nd,
bar\bwill be able to match the 2nd and 3rd strings, and
\bbarwill match the 2nd and 4th strings.
Make text shorter but don't break last word
To make long text at most N characters long but leave last word intact, use
Match complete word
will match the complete word with no alphanumeric and
_ preceding or following by it.
Taking from regularexpression.info
There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
The term word character here means any of the following
In short, word character =
To make it easier to find whole words, we can use the metacharacter
\b. It marks the beginning and the end of an alphanumeric sequence*. Also, since it only serves to mark this locations, it actually matches no character on its own.
*: It is common to call an alphanumeric sequence a word, since we can catch it's characters with a
\w (the word characters class). This can be misleading, though, since
\w also includes numbers and, in most flavors, the underscore.
|No, since there's no ocurrence of the whole word |
|Yes, since there's nothing before nor after |
|Yes: there's nothing before |
|Yes, since there's nothing before |
|Yes, since there's nothing after |
This is the opposite of
\b, matching against the location of every non-boundary character. Like
\b, since it matches locations, it matches no character on its own. It is useful for finding non whole words.
|Yes, since |
|Yes, it matches the second comma because |