EMS logo

Products Navigation

choose your database tool

Our Partnership Status

Microsoft Certified Partner
Oracle Certified Partner
Embarcadero Technology Partner

SQL Articles

All SQL Articles

Leo Hsu and Regina Obe
Regular Expressions in PostgreSQL

Every programmer should embrace and use regular expressions (INCLUDING Database programmers). There are many places where regular expressions can be used to reduce a 20 line piece of code into a 1 liner. Why write 20 lines of code when you can write 1.

Regular expressions are a domain language just like SQL. Just like SQL they are embedded in many places. You have them in your program editor. You see it in sed, grep, perl, PHP, Python, VB.NET, C#, in ASP.NET validators and javascript for checking correctness of input. You have them in PostgreSQL as well where you can use them in SQL statements, domain definitions and check constraints. You can mix regular expressions with SQL. When you mix the two domain languages, you can do enchanting things with a flip of a wrist that would amaze your less informed friends. Embrace the power of domain languages and mix it up. PostgreSQL makes that much easier than any other DBMS we can think of.

For more details on using regular expressions in PostgreSQL, check out the manual pages Pattern Matching in PostgreSQL

The problem with regular expressions is that they are slightly different depending on what language environment you are running them in. Different enough to be frustrating. We'll just focus on their use in PostgreSQL, though these lessons are applicable to other environments.

Common usages

We are going to go backwards a bit. We will start with demonstrations of PostgreSQL SQL statements that find and replace things and list things out with regular expressions. For these exercises we will be using a contrived table called notes, which you can create with the following code.

CREATE TABLE notes(note_id serial primary key, description text); INSERT INTO notes(description) VALUES('John''s email address is johnny@johnnydoessql.com. Priscilla manages the http://www.johnnydoessql.com site. She also manages the site http://jilldoessql.com and can be reached at 345.678.9999 She can be reached at (123) 456-7890 and her email address is prissy@johnnydoessql.com or prissy@jilldoessql.com.'); INSERT INTO notes(description) VALUES('I like ` # marks and other stuff that annoys militantdba@johnnydoessql.com. Militant if you have issues, give someone who gives a damn a call at (999) 666-6666.');

Regular Expressions in PostgreSQL

PostgreSQL has a rich set of functions and operators for working with regular expressions. The ones we commonly use are ~, regexp_replace, and regexp_matches.

We use the PostgreSQL g flag in our use more often than not. The g flag is the greedy flag that returns, replaces all occurrences of the pattern. If you leave the flag out, only the first occurrence is replaced or returned in the regexp_replace, regexp_matches constructs. The ~ operator is like the LIKE statement, but for regular expressions.

Destroying information

The power of databases is not only do they allow you to store/retrieve information quickly, but they allow you to destroy information just as quickly. Every database programmer should be versed in the art of information destruction.

 -- remove email addresses if description has email address 
 UPDATE notes SET description = regexp_replace(description, 
        '---', 'g') 
        WHERE description 
        ~ E'[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]{2,4}'; 
 -- remove website urls if description has website urls 
 -- matches things like http://www.something.com or http://something.com

 UPDATE notes SET description = regexp_replace(description, 
        E'--', 'g') 
        WHERE description 
        ~ E'http://[[[:alnum:]]+.]*[[:alnum:]]+[.][[:alnum:]]+'; 

-- remove phone numbers if description 
-- has phone numbers e.g. (123) 456-7890 or 456-7890 or 123.456.7890
UPDATE notes SET description = regexp_replace(description, 
        '---', 'g') 
        WHERE description 
        ~ E'[\(]{0,1}[0-9]{3}[\).-]{0,1}[[:space:]]*[0-9]{3}[.-]{0,1}[0-9]{4}'; 

-- set anything to single space that is not not a \ ( ) & * /,;. > < space or alpha numeric        
UPDATE notes set description = regexp_replace(description, E'[^\(\)\&\/,;\*\:.\>\<[:space:]a-zA-Z0-9-]', ' ') 
  WHERE description ~ E'[^\(\)\&\/,;\*\:.\<\>[:space:]a-zA-Z0-9-]';
-- replace high byte characters with space 
-- this is useful if you have your database in utf8 and you often need to use latin1 encoding
-- and you have a table that shouldn't have high byte characters 
-- such as junk you get from scraping websites
-- high byte characters don't convert down to latin1
UPDATE notes SET description = regexp_replace(description,E'[^\x01-\x7E]', ' ', 'g')
WHERE description ~ E'[^\x01-\x7E]';

Getting list of matches

These examples use similar to our destroy but show us in a table, a list of stuff that match. Here we use our favorite PostgreSQL regexp_matches function.

Source: postgresonline