Regular expression (Regex) guide for Usermaven

Regular expression (regex) Guide for Usermaven

Regular expressions (regex) are a powerful tool within Usermaven analytics that enable you to filter and extract valuable insights from URLs in various reports like conversion goals, funnels, journeys, and much more. This guide will walk you through the basics of regex and provide practical examples tailored for Usermaven's RE2 syntax, ensuring you can efficiently leverage regex for URL filtering.

Let's embark on this journey to harness the full potential of regex in enhancing your analytics capabilities.

What is a URL?

Before diving into regex patterns, let's first briefly understand what a URL is and what it entails. A URL (Uniform Resource Locator) is an address that specifies the location of a resource on the internet. It typically consists of several parts:

  • Protocol: The communication method used (e.g., http or https).

  • Domain: The website name (e.g., www.example.com (opens in a new tab)).

  • Path: The specific location of the resource within the website (e.g., /products/product-123).

  • Query: Additional information passed to the server (e.g., ?color=blue).

most commonly taken paths

Understanding regular expressions (Regex)

A Regular Expression (Regex) is a sequence of characters that defines a search pattern. It can be used to match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regex is extremely useful in extracting information from any text by searching for one or more matches of a specific search pattern. In the context of Usermaven, you can use regex to filter page URLs based on specific patterns. This capability allows you to create more targeted reports and analytics.

Basic components of regex

Regex uses a combination of symbols and characters to define patterns for searching text. Here are some common elements:

  • Literals: These are the exact text that should match. For example, Usermaven.

  • Metacharacters: Special characters with specific meanings, like: -Matches any character (except newline). *: Matches zero or more repetitions of the preceding character. +: Matches one or more repetitions of the preceding character. ^: Matches the beginning of the string. $: Matches the end of the string.

  • Character classes: Represent a set of characters, like: [a-z]: Matches any lowercase letter. [0-9]: Matches any digit. \w: Matches any word character (letters, numbers, and underscore).

  • Escape Character: Certain characters have special meanings in regex, so you need to escape them with a backslash () to treat them literally. For example, . matches a literal period, not "any character."

RE2 syntax: Key features

Usermaven employs the RE2 syntax for regex, which prioritizes safety and speed, ensuring that your filtering operations are efficient and secure. Here are some key aspects:

  • No backreferences: RE2 does not support backreferences, which can make some patterns more straightforward and safer to execute.

  • Limited lookaround assertions: RE2 focuses on providing fast and predictable execution times, which means complex lookaheads and lookbehinds are not supported.

Filtering URLs with regex in Usermaven

Here’s what working on a Regex for URL filtering should entail:

  • Define your filtering criteria
    ~ What specific URLs do you want to include or exclude?
    ~ Are you looking for URLs containing specific words, following a specific format, or belonging to a particular domain?
  • Construct your regex pattern
    ~ Use the basic building blocks mentioned above to create a pattern that matches your criteria. For instance, use "^" and "$" anchors to match the entire URL and escape special characters with "" to treat them as literals.
    ~ Online regex testers like https://regex101.com/ (opens in a new tab) can help you visualize and test your patterns.

most commonly taken paths

Examples and use-Cases

Below are practical examples and use cases for using Regex within Usermaven for URL filtering:

Matching Specific URLs:
Matching URLs with specific characteristics:
Advanced matching and excluding:
  • Matching URLs containing a word:
    Regex: .product.
    Explanation: Matches any URL containing the word "product" anywhere in the path.
    Use Case: Analyzing user behavior on product-related pages.

  • Matching URLs with a specific domain:
    Regex: ^https?://(www.)?example.com/.*$ (opens in a new tab)
    Explanation: Matches any URL starting with "http" or "https" (optional), followed by "www (opens in a new tab)." (optional), then "example.com" and any path.
    Use Case: Filtering data specifically for your website ("example.com") and excluding external links.

  • Matching URLs with a specific format:
    Regex: ^https?://(www.)?example.com/products/product-\d+$ (opens in a new tab)
    Explanation: Matches any URL starting with "http" or "https" (optional), followed by "www (opens in a new tab)." (optional), then "example.com", "/products/", "product-", and one or more digits representing the product ID.
    Use Case: Analyzing user behavior specifically on product detail pages.

  • Excluding specific URLs:
    Regex: ^(?!./login/).
    Explanation: Matches any URL under "https://www.example.com/ (opens in a new tab)" except those ending with "/login".
    Use Case: Excluding login pages from user behavior analysis.

Tips for using regex in Usermaven

When working on Regex patterns, remember to start with simple patterns and gradually add complexity as needed. Complex patterns can be harder to debug and understand. Moreover, before applying your regex patterns to filter URLs in reports, test them to ensure they match the expected URLs.

Conclusion

Regex is a versatile tool for URL filtering in Usermaven, empowering you to extract valuable insights from your analytics data. By mastering the basics of regex and crafting tailored patterns, you can efficiently filter URLs to meet your specific requirements.