Word Boundary

Syntax

  • POSIX style, end of word: [[:>:]]
  • POSIX style, start of word: [[:<:]]
  • POSIX style, word boundary: [[:<:][:>:]]
  • SVR4/GNU, end of word: \>
  • SVR4/GNU, start of word: \<
  • Perl/GNU, word boundary: \b
  • Tcl, end of word: \M
  • Tcl, start of word: \m
  • Tcl, word boundary: \y
  • Portable ERE, start of word: (^|[^[:alnum:]_])
  • Portable ERE, end of word: ([^[:alnum:]_]|$)

Remarks

Additional Resources

Find patterns at the beginning or end of a word

Examine the following strings:

foobarfoo
bar
foobar
barfoo
  • the regular expression bar will match all four strings,
  • \bbar\b will only match the 2nd,
  • bar\b will be able to match the 2nd and 3rd strings, and
  • \bbar will match the 2nd and 4th strings.

Make text shorter but don't break last word

To make long text at most N characters long but leave last word intact, use .{0,N}\b pattern:

^(.{0,N})\b.*

Match complete word

\bfoo\b

will match the complete word with no alphanumeric and _ preceding or following by it.

Taking from regularexpression.info

There are three different positions that qualify as word boundaries:

  1. Before the first character in the string, if the first character is a word character.
  2. After the last character in the string, if the last character is a word character.
  3. Between two characters in the string, where one is a word character and the other is not a word character.

The term word character here means any of the following

  1. Alphabet([a-zA-Z])
  2. Number([0-9])
  3. Underscore _

In short, word character = \w = [a-zA-Z0-9_]

Word boundaries

The \b metacharacter

To make it easier to find whole words, we can use the metacharacter \b. It marks the beginning and the end of an alphanumeric sequence*. Also, since it only serves to mark this locations, it actually matches no character on its own.

*: It is common to call an alphanumeric sequence a word, since we can catch it's characters with a \w (the word characters class). This can be misleading, though, since \w also includes numbers and, in most flavors, the underscore.

Examples:

RegexInputMatches?
\bstack\bstackoverflowNo, since there's no ocurrence of the whole word stack
\bstack\bfoo stack barYes, since there's nothing before nor after stack
\bstack\bstack!overflowYes: there's nothing before stack and !is not a word character
\bstackstackoverflowYes, since there's nothing before stack
overflow\bstackoverflowYes, since there's nothing after overflow

The \B metacharacter

This is the opposite of \b, matching against the location of every non-boundary character. Like \b, since it matches locations, it matches no character on its own. It is useful for finding non whole words.

Examples:

RegexInputMatches?
\Bb\BabcYes, since b is not surrounded by word boundaries.
\Ba\BabcNo, a has a word boundary on its left side.
a\BabcYes, a does not have a word boundary on its right side.
\B,\Ba,,,bYes, it matches the second comma because \B will also match the space between two non-word characters (it should be noted that there is a word boundary to the left of the first comma and to the right of the second).