Examples, Tools and Resources for Regular Expressions

Using Regular Expressions is not easy. Mostly we have the feeling we need to learn a new language on the top of those we already know. But, the power and the flexibility that RegEx provides, make it worthy to learn. Take a look at some useful patterns, tools, and sources!

Matching all the words

Matching words in the given context can be complicated. Since many languages use special characters, mostly we can’t use any built-in function for that. Fortunately, we have the chance to build our character mapping with RegEx. Let’s see an example that is compatible with Hungarian:

/[a-zá-ű\d]\S*/i
The i flag stands for case-insensitive mapping.

Matching all the content between the specified tags

Often we need to capture the content between HTML tags. For example, when we are parsing some context – may be an input or maybe an HTML source from somewhere – we need the flexibility to extract what we need easily. Here is an example with <p> tags:

/<p>(.*)<\/p>/i

Also, we can make it dynamic. For example we want to get all the headers from <h2> to <h4>. Of course, we need to ensure we get the content between the opening and closing tag and not between an <h2> and an <h3> tag.

/<h([2-4])>(.+?)<\/h\1>/i

So, we define the first capturing group from 2 to 4. That means, only <h2>, <h3> and <h4> will be relevant. Then, we need them only if they have some content between the tags. In the end, we see the \1 numeric reference. It means we refer to the first capturing group’s result. This way our pattern can be dynamic.

Tools and Resources

Of course, these are only basic patterns. We just wanted to show how powerful can be a pure expression.

If you are interested in learning more about RegEx, we suggest the following resources:

Also, you can test your expressions easily with this tool: https://regexr.com/. A huge advantage is, you can check both the JavaScript and PCRE results as well.

Special thanks for the following recource(s): Icon made by Vitaly Gorbachev from www.flaticon.com

3 thoughts on “Examples, Tools and Resources for Regular Expressions

  1. Unfortunately, the example on html paragraph matching comes with a pitfall.

    Consider the case of several <p> elements in a row. Regular expression engines match greedily by default, ie. quantifiers (eg ‘*’) consume the maximum amount of content still allowing for a match. Thus the proposed regular expression will match all data between the first opening <p> and the last closing </p>. Non-greedy matching must be specified in the regular expression by appending a ‘?’ to the quantifier, as it is shown in the <h…> example.

    Compare https://regex101.com/r/7vQOin/2 (greedy, won’t work as expected) vs. https://regex101.com/r/4z0pKb/1 (non-greedy, matches single <p> element only)

    Moreover, without the proper flag (/m/), matching will be limited to single lines. Paragraph contents are often formatted with line breaks, and html tag are often placed on a single line (This is a minor issue, as the html fragment can be preprocessed eliminating all line breaks).

    Regular expressions are a powerful tool. However, always be aware that it is a tool for _textual pattern matching_ when working on representations of _hierarchically structured data_ as html, especially when it comes with a rich syntax.

    To dwell on the <p> example, some paragraph elements may come with attributes defined (‘< p class=”whatever”>’), so simple tag matching might fail here. Though this particular example can easily be rectified ( ‘<p(?: |>)(.*?)</p>’), regular expresssions tend to become convoluted quickly. In many cases, using a full-fledged parser is more expedient.

    1. Hey! Thank you for your long explanation!

      Yes, I found some issues myself as well. But I could not break down the problem itself for smaller details like you did.

      I really appreciate your comment. I hope I will have time soon to update the article regarding to your comment, if you don’t mind!

      Thanks again!

Leave a Reply

Your email address will not be published. Required fields are marked *