Regular Expressions (Regexp)
Crazy looking, but crazy useful.
I recently came across Regular Expressions, also referred to as regexp. Here is an example: [a-zA-Z0–9_.+-]+@[a-zA-Z0–9-]+\.[a-zA-Z0–9-.]+
. That seems like a hot mess and incredibly intimidating. However, if you enjoy coding, especially the puzzle solving aspect to it, you might enjoy the crazy mess above. Regexps help search for particular data that’s needed, such as emails, phone numbers in a particular format, number patterns, and much more. Regexps can apply to many languages, so the examples shared should apply to most. I’ll be working out of VS Code, but any code editor should be fine.
Here is a youtube video that has helped me understand some of the basics. I encourage you to go through it yourself. This blog will be a short reflection of it to practice my learning and understanding, which I hope will help you.
There are three aspects of regexps : characters, metacharacters, and quantifiers.
Characters are anything from ABCDEFGHIJKLMNOPQRSTUVWXYZ
, abcdefghijklmnopqrstuvwxyz
, and 1234567890
.
Metacharacters are ()[]{}. ? + *
. A simplified explanation of metacharacters is that they look like they have two jobs: 1) They act as regular characters, or 2) They help with regexps functionalities are called quantifiers. We’ll get to that shortly. To know when ()[]{}. ? +
or *
is a metacharacter, it needs to be “escaped” so they can act as regular characters. To escape a metacharacter you need a backslash \
, such as \+
or \?
. I like to think of it as the metacharacters are escaping their jobs in regexps functionalities to be regular characters.
Quantifiers are simply ()[]{}.?+*
where the curly braces hold numbers to search for an exact amount of something or a range of something, which looks like{5}
or {10, 13}
. These quantifiers will search for the amount of characters that are being searched for.
Some Practice
To practice using regexps use CMD + F to get the search bar highlighted in the image below. I find that regexps works best in VS Code with the search bar when Aa
and .*
are selected. VS Code has a whole section of documentation on regexps which will explain much more.
Characters
From this point, you can do an exact match search. In the search bar, chickens has been typed in which will only bring an exact match as seen in the image below.
The search is also case sensitive so searching for a capital M will only highlight capital M as in the image below.
Metacharacters
Let’s compare the metacharacter for .
and the quantifier for .
. Remember, the metacharacter of .
needs to be escaped so we need \.
, otherwise the .
will be considered a quantifier, which serves a different purpose. The result shows that we are looking for the period character specifically.
In Image A, the character “.” is highlighted using \.
.
Image B shows the quantifier .
was used, which will search for all characters.
Here is a cheat sheet of regexps quantifiers and shortcuts to help with the upcoming examples and for general reference:
. - Will find any character that is not a new line.
\d - Will find all digits between 0-9.
\D - Will find characters that are not digits 0-9.
\w - Will find word characters.
\W - Will find non-word characters.
\s - Will find white space.
\S - Will find non-white space.
\b - Will find a word boundaries.
\B - Will find non-word boundaries.
^ - Will find the beginning of a string.
$ - Will find the end of a string.[] - Will match characters in the brackets.
[^] - Will match characters that are not in the bracket.
| - Will match characters in an or case.
() - Will group a match.Quantifiers:
* - Will search for 0 or more of character sought.
+ - Will search for 1 or more of character sought.
? - Will search for 0 or 1 of the character sought.
{4} - Will search for the exact quantity of 4 objects sought.
{3,4} - Will search for the range of 3 to 4 objects sought.
Example 1 — Searching for French phone number
French phone numbers follow the pattern of two as such as xx-xx-xx-xx-xx
or xx.xx.xx.xx.xx
, where periods or dashes are common to separate the digits.
The regexp \d{2}[.-]?
works to find the xx-xx-xx-xx-xx
and xx.xx.xx.xx.xx
pattern. The interesting to consider is finding phone numbers that use a dash or a period. \d
will search for all digits. The {2}
that follows will then search for digits with a length of 2. The []
will hold characters that are sought. In this case the .
or -
are searched for. The ?
is looking exactly 0 or or 1 of the previous character (i.e. .
or -
).
Example 2 — Searching for American phone number
American phone numbers follow the patters of xxx-xxx-xxxx
or xxx.xxx.xxxx
. There are others such as (xxx)xxx-xxxx
, but for this example, only the period and dashes will be considered. I challenge you to find American phone numbers with a regexp that will include parenthesis. This regexp \d{3,4}[.-]?
works to find an American phone number that includes periods and dashes. Once again, we’re searching for digits using \d
. Since the American number system has groups of 3 to 4, we need to search for that. So we show that as \d{3,4}
, where 3 is the minimum digits searched for and 4 is the maximum number of digits searched for. Next, the symbols of .
or “-” -
need to be considered. That’s done using \d{3,4}[.-]
. The ?
is added because we are looking for exactly 0 or 1 of the previous characters(i.e. .
or -
).
Those are some solutions to solve the examples presented. I would encourage you to practice creating your own regexps with some other data you would like to search for. I also recommend watching the youtube video mentioned in the beginning. They can be lots of fun once you get through the intimidation. As a new programmer, I feel like I can impress my friends as a hacker :)