JS REGEX 10: Extracting Specific Data with RegExp and exec() in JavaScript
Have you ever needed to pull just the important bits out of a wall of text or code? Imagine you’re scanning through lines of HTML like this:
<a href="http://goalkicker.com">goalkicker</a>
Now, you don’t want the whole tag — just the useful parts:
- The link address →
http://goalkicker.com
- The visible text →
goalkicker
That’s where Regular Expressions (RegExp) and the exec()
method become your best tools.
They let you write patterns that hunt down specific text formats, and then capture the exact pieces you care about.
Let’s explore how it works.
🧩 1. What is a Regular Expression?
A Regular Expression (RegExp) is a pattern that describes a set of possible strings. You use it to search, extract, or validate text according to certain rules.
Think of it as writing a small “search formula” — one that says:
“Find text that looks like this.”
You can use RegExp for tasks like:
- Checking if an email address is valid.
- Extracting URLs or phone numbers.
- Splitting logs into data fields.
- Cleaning text input before saving it.
🪄 2. Capturing Groups — Your Data Nets 🎣
Parentheses ( )
in regex aren’t just for grouping; they capture parts of what you matched.
That means:
Whatever falls inside those parentheses gets saved separately.
It’s like casting a net in a river — you might be searching for fish, but you only keep the ones that fall inside your net.
Example:
(John) (Doe)
If you apply that to the string "John Doe"
, you get:
match[0]
→"John Doe"
(the entire match)match[1]
→"John"
match[2]
→"Doe"
So parentheses turn your regex into a data extractor, not just a matcher.
🧠 3. Introducing exec()
— The Smart Match Finder
JavaScript gives you a special method to work with regex:
RegExp.exec(string)
It’s like a powerful detective.
When you call exec()
, it searches through your text once, and returns a detailed match report.
That report (an array) contains:
match[0]
: the full matchmatch[1], match[2], ...
: the captured groups (from parentheses)- some extra info (like where it found the match)
If no match is found, it simply returns null
.
🧪 4. Example — Extracting an Anchor Tag
Let’s look at a real-world scenario:
You have this HTML code:
<a href="http://goalkicker.com">goalkicker</a>
You want:
- The URL inside the
href
attribute - The text between the
<a>
and</a>
tags
Here’s how you do it:
const html = '<a href="http://goalkicker.com">goalkicker</a>';
const regex = /<a href="(https?:\/\/[^"]+)">([^<]+)<\/a>/;
const m = regex.exec(html);
console.log(m[0]); // Full match: <a href="http://goalkicker.com">goalkicker</a>
console.log(m[1]); // First capture: http://goalkicker.com
console.log(m[2]); // Second capture: goalkicker
🧩 Explanation:
https?
→ matcheshttp
orhttps
:\/\/
→ matches the literal://
[^"]+
→ matches one or more characters that are not a quote ("
)([^<]+)
→ captures the visible text until it sees<
(the start of</a>
)
The parentheses ()
around both patterns make them capturing groups,
so exec()
neatly returns both the URL and the text as separate results.
🔁 5. Finding All Matches with g
(Global Flag)
By default, exec()
finds just the first match.
But what if your HTML has multiple links?
You can add the global flag g
to your regex.
Then, each time you call exec()
, it picks up where it left off — giving you the next match.
const html = `
<a href="http://goalkicker.com">goalkicker</a>
<a href="http://example.com">Example</a>
`;
const regex = /<a href="(https?:\/\/[^"]+)">([^<]+)<\/a>/g;
let match;
while ((match = regex.exec(html)) !== null) {
console.log("URL:", match[1]);
console.log("Text:", match[2]);
}
This loop keeps calling exec()
until there are no more matches (null
).
Result:
URL: http://goalkicker.com
Text: goalkicker
URL: http://example.com
Text: Example
🧠 Why does this work?
Because when the regex has the g
flag, JavaScript tracks its internal pointer (called lastIndex
).
Each new exec()
call starts scanning from that position.
Once the end is reached — exec()
returns null
.
⚡ 6. Common Mistakes (and Fixes)
-
❌ Forgetting the quotes in your regex
/<a href=(https?:\/\/[^"]+)>([^<]*)<\/a>/
🔧 Fix: Always include the quotes:
/<a href="(https?:\/\/[^"]+)">([^<]*)<\/a>/
-
❌ Forgetting the
g
flag Withoutg
,exec()
will always return the first match — no matter how many exist. -
❌ Using greedy
.+
without limits Always use+?
(lazy quantifier) when matching content that could repeat, e.g.(.+?)
instead of(.+)
, to stop at the nearest match instead of the farthest.
🧩 7. When exec()
Returns null
When no text in your string matches the pattern, exec()
returns null
.
That’s JavaScript’s way of saying: “I found nothing that fits your rule.”
For example:
/<img src="(.+?)">/.exec("<a href='link'>link</a>")
// → null (because there’s no <img> tag)
🧠 8. Summary Table
Concept | Explanation |
---|---|
RegExp.exec() |
Searches the string for a match and returns detailed info |
Capturing Groups ( ) |
Parts of the pattern saved separately |
match[0] |
Full matched text |
match[1] , match[2] |
Values captured by parentheses |
g flag |
Enables multiple searches using a loop |
null |
Returned when no match is found |
🧪 Practice Tasks
Let’s solidify what you’ve learned. Try solving these:
-
Extract the
href
and link text from:<a href="https://site.com">Site</a>
Using
exec()
, what doesm[1]
contain? -
Write a regex to capture the
src
andalt
values from:<img src="logo.png" alt="Company Logo">
-
Fix this broken regex so it captures both parts correctly:
const regex = /<a href=(https?:\/\/[^"]+)>([^<]*)<\/a>/;
-
Use a loop with
exec()
(and theg
flag) to extract all email addresses from:"Contact us at info@company.com or support@help.org"
-
Explain why
exec()
sometimes returnsnull
.
🧩 Review — Fill-Gap Questions
- Regular expressions are used to __ patterns within strings.
- Parentheses
( )
in regex are known as __ groups. - The method used to execute a regex and return match details is __.
- The full matched text is always stored in
match[____]
. - The captured groups start from index number __.
- The global flag
g
allows you to find __ matches, not just one. - The property
lastIndex
in regex helps track the search __. - The result of
exec()
is an __ containing match info. - If no match is found,
exec()
returns the value __. - Using
(.+?)
instead of(.+)
makes the match __ (non-greedy).