JS REGEX 10: Extracting Specific Data with RegExp and exec() in JavaScript

Regular Expressions, or RegExps, are patterns used to match, locate, and extract parts of text. They are essential when working with strings that follow a particular structure — like emails, URLs, or HTML tags.
1. What is a Regular Expression?
A Regular Expression (RegExp) is a symbolic pattern that defines a set of strings. It helps identify specific sequences within text. Typical uses include:
- Validating input (e.g., email addresses or phone numbers).
- Extracting structured data from unstructured text.
- Searching and replacing text efficiently.
- Splitting or cleaning large text inputs.
2. Capturing Groups
Parentheses ( ) in regex not only group expressions but also capture portions of the match.
This means the data matched inside ( ) is stored separately, allowing easy extraction.
Example:
(John) (Doe)
Applied to "John Doe", the results are:
match[0]:"John Doe"– full matchmatch[1]:"John"– first captured groupmatch[2]:"Doe"– second captured group
Capturing groups make regex useful for data extraction, not just text matching.
3. The exec() Method
In JavaScript, RegExp.exec(string) executes a search for a match in a string and returns detailed results as an array.
The returned array includes:
match[0]: the full match.match[1],match[2], …: captured groups.- Properties like
index(the position of the match) andinput(the original string).
If no match is found, exec() returns null.
4. Extracting Data Example
Suppose you want to extract both the URL and text from an HTML anchor tag:
<a href="http://goalkicker.com">goalkicker</a>
The JavaScript code:
const html = '<a href="http://goalkicker.com">goalkicker</a>';
const regex = /<a href="(https?:\/\/[^"]+)">([^<]+)<\/a>/;
const result = regex.exec(html);
console.log(result[0]); // <a href="http://goalkicker.com">goalkicker</a>
console.log(result[1]); // http://goalkicker.com
console.log(result[2]); // goalkicker
Explanation:
(https?:\/\/[^"]+)captures the URL.([^<]+)captures the visible link text.- The parentheses ensure these values are stored separately in the match array.
5. Extracting Multiple Matches
Without the global flag g, exec() only returns the first match.
Adding g allows multiple matches to be processed sequentially in a loop.
const html = `
<a href="http://goalkicker.com">goalkicker</a>
<a href="http://example.com">Example</a>
`;
const regex = /<a href="(https?:\/\/[^"]+)">([^<]+)<\/a>/g;
let match;
while ((match = regex.exec(html)) !== null) {
console.log("URL:", match[1]);
console.log("Text:", match[2]);
}
Each loop iteration retrieves the next match because exec() tracks its progress using the internal property lastIndex. When all matches are found, it returns null.
6. Common Mistakes and Fixes
-
Missing quotes in patterns: Incorrect:
/href=(https?:\/\/[^"]+)/Correct:/href="(https?:\/\/[^"]+)"/ -
Omitting the
gflag: Withoutg, only the first match is found. -
Using greedy quantifiers:
.+matches as much as possible, often too much. Use the lazy version.+?to stop at the nearest match.
7. When exec() Returns null
If the target text does not contain any substring that matches the given regex, the return value is null.
Example:
/<img src="(.+?)">/.exec("<a href='link'>link</a>")
// Returns null
This happens because there is no <img> tag in the input string.
8. Practice Tasks
-
Extract the
hrefand link text from<a href="https://site.com">Site</a>What is stored in
match[1]? -
Write a regex to capture the
srcandaltvalues from<img src="logo.png" alt="Company Logo"> -
Correct the following regex:
const regex = /<a href=(https?:\/\/[^"]+)>([^<]*)<\/a>/; -
Use
exec()in a loop to extract all email addresses from:"Contact us at info@company.com or support@help.org" -
Explain why
exec()sometimes returnsnull.
9. Review — Fill-Gap Questions
- Regular expressions are used to __ patterns within text.
- Parentheses
( )in regex are known as __ groups. - The method used to execute a regex in JavaScript is __.
- The full matched text is always found in
match[____]. - Captured groups begin from index number __.
- Adding the
gflag allows you to find __ matches. - The regex property that tracks progress between matches is __.
- The result returned by
exec()is an __ containing match details. - When no match is found,
exec()returns __. - Using
(.+?)instead of(.+)makes the match __ (non-greedy).