article thumbnail
Regular Expressions Made Simple
A simple guide to understanding and using regular expressions
8 min read
#programming, #regex, #friday2

Most programming languages support regular expressions (regex). Because regex enables powerful pattern matching and text manipulation, it's a valuable skill to learn. This guide will cover basic concepts and advanced techniques.

The following basic syntax elements simply need to be memorized (there are only 21 of them). Once you understand their meanings, reading and writing regular expressions becomes much easier. Review these for a few minutes each day for a week.

. a period matches any single character except for a newline

^ a carot matches the start of a line, unless it is the first character inside square brackets. (Remember they dangle a carrot in front of the rabbit)

$ a dollar sign matches the end of a line (Remember the saying "the buck stops here")

\ a backslash escapes special characters

[ ] if characters are in square brackets it means to match any characters inside. There is one exception for this. If the first character in square brackets is a carot ^ then it means to NOT match any characters inside.

\d matches any single digit. Same as [0-9]

\D Matches any single character that is NOT a digit. Same as [^0-9]

\w matches any word character (letters, digits, underscore). Same as [a-zA-Z0-9_]

\W matches any characters that is NOT a word character. same as [^a-zA-Z0-9_]

\s matches any whitespace character (space, tab, newline)

\S matches any character that is NOT whitespace

* a star matches 0 or more occurrences

+ a plus matches 1 or more occurrences

? a question mark matches 0 or 1 occurrence

{n} a number inside curly brackets matches exactly n occurrences

{n,m} two numbers inside curly brackets matches between n and m occurrences

( ) parenthesis capture the matched text for later use

i case-insensitive matching

g global matching (find all matches)

m multiline mode

s treat the whole thing as a single line

Okay, now that we've (hopefully) memorized the list above 😉, let's walk through an example to see how it all comes together. Consider the following Python function. What do you think the whatami function is checking for? Note: the regex is in green.

import re

def whatami(str):
    if(type(obj) is str):
        return bool(re.search(r"^[\w\.\+\-]+\@[\w]+\.[a-z]{2,10}$", str))
    else:
        return False

The regex of the function above is ^[\w\.\+\-]+\@[\w]+\.[a-z]{2,10}$ Lets break it down and see if we can understand what it is checking for...

import re

def isEmail(str):
    if(type(obj) is str):
        return bool(re.search(r"^[\w\.\+\-]+\@[\w]+\.[a-z]{2,10}$", str))
    else:
        return False

if isEmail('youremail@gmail.com'):
    print('valid email address')
else:
    print('invalid email address')

Here is the same function in PHP

<?php
if(isEmail('youremail@gmail.com')){
    echo 'valid email address';
}
else{
    echo 'invalid email address';
}

function isEmail($str=''){
    if(strlen($str)==0){return false;}
    if(preg_match('/^[\w\.\+\-]+\@[\w]+\.[a-z]{2,10}/',$str)){return true;}
    return false;
}

Understanding regular expressions is an invaluable asset for any programmer. The ability to quickly and efficiently manipulate text data is crucial in countless programming tasks, from data validation and parsing to search and replace operations. By mastering regex, you'll not only be able to write more concise and powerful code, but you'll also gain a deeper understanding of how to work with strings, ultimately making you a more versatile and effective programmer. This knowledge will empower you to tackle complex text-based challenges with confidence and improve the overall quality and efficiency of your code.