This is the Website's Title


nano 2.0.6 Regex Notes

Being the creature of habit that I am, I like to use nano when editing on the command line. It is hard to break decades long habits...

The nano version installed by default on macOS is GNU 2.0.6, which offers syntax highlighting using a wonky subset of POSIX regular expressions. It recognizes the following standard rules:

Basic Regex Rules

^ - Beginning of a line.
$ - End of a line.
. - Any single character.
? - Zero or one repeatition of the proceeding character or group.
* - Zero or more repeatitions of the proceeding character or group.
+ - One or more repeatitions of the proceeding character or group.
\ - Escapes the next character; turning it into a literal instead of a regex metacharacter.
() - Form a group.
| - Alternation, aka "or".

Examples:
^abc$ - Matches a new line with "abc" followed by a line break. (Will not match " abc", "abc ", etc.)
a?b*c+. - Would match "bbbccx" or "accccy", but not "aabbbx"
(ab)?c+ - Matches "abcccc" or "cc", but not "ababc" or just "ab". (a|b)c - Matches "ab" or "bc", but not "abc".

The square brackets [ and ] are used to define character classes:
[abc]bc - Matches "abc", "bbc", or "cbc".

Negation in character classes is done with the caret ^:
[^abc]bc - Matches any character that is not "a", "b", or "b" followed by "bc".

Ranges can be specified with a - between two characters:
[a-c]bc - Matches "abc", "bbc", or "cbc".

Character class subtraction does not appear to be recognized.

I say that it must be some POSIX engine because it recognizes [[:digit:]] but not \d, [:d:], or \p{Digit}. The following character classes are recognized:
[[:alnum:]] - Equivalent to [a-zA-Z0-9], matches an alphanumeric character.
[[:alpha:]] - Equivalent to [a-zA-Z], matches an alphabetical character.
[[:lower:]] - Equivalent to [a-z], matches a lower case alphabetical character.
[[:upper:]] - Equivalent to [A-Z], matches an upper case alphabetical character.
[[:digit:]] - Equivalent to [0-9], matches a digit.
[[:punct:]] - Matches punctuation marks; have not tested exactly which ones.
[[:blank:]] and [[:space:]] - Seem to be equivalent; match spaces and tabs (:space: is supposed to match all whitespace breaks, but I cannot get it to do so...)
[[:print:]] - Matches any printable character.

What is missing?

There seems to be no support for lookahead/behind or backreferencing capture groups by number. You also do not appear to be able to escape special characters in character classes, like \t for a tab.

No look around makes it extremely hard in some cases to write a good regex when trying to exclude a thing.

2021-10-01