Sample to find email adress with Match Regex

Hi I need a working sample pattern to extract email adresses from a text

Thanks Armin

https://lmgtfy.com/?q=email+regex

schon probiert?

siehe auch https://www.regular-expressions.info/email.html

noch einen schönen Tag,
Paul

Die Regex Beispiele aus dem Netz funktionieren doch mit 4D meist nicht, die müssen doch etwas anders formatiert werden, aber wie ?

'llo,

Normalerweise hat RegexLab das schon erzeugt?

regexlab

If the example is a PCRE regex, and if you mask all backslashes in 4D: \` -> \`, then these should work right away. Unlike e.g. with JavaScipt-Regex, this only works for simpler cases.
But you have to test it anyway.

And BTW, there is no regex that really covers all e-mail cases safely. The safest way to check an email address is to use it. But in certain circumstances this is not possible either. A first step could be to check if there is a matching DNS entry (MX) for the server part of the address.

Try this:

a simple database for testing.

But you still need to learn how a regex works. I recommend the grep chapter in the BBEdit Manual if you have BBEdit :slight_smile:

Paul

1 Like

Nice! :slight_smile:

Regarding learning: This is helpful too:

https://regexr.com/

Contains reference, cheatsheet, explanation and a testing environment.

1 Like

AJ_Tools_Regex offers different Regex functions (Match, Matches, Substitute, Split and Extract) that can help you.

The 4D v18 project is available on our repository Github

1 Like

Wow Maurice thank you

Armin

Hi,

I use this

C_TEXTE($vt_emailLeftSide;$vt_smtpServer)
$vt_emailLeftSide:=""
$vt_smtpServer:=""

Si (Longueur($vt_email)>0)

C_ENTIER LONG($vl_start)
TABLEAU ENTIER LONG($tl_pos;0)
TABLEAU ENTIER LONG($tl_length;0)
C_TEXTE($vt_regex)

// Bruno LEGAY (BLE) (21/07/2016)
// - “(?i)” : case insentive engine
// - “^…$” : From start to end
// - “([_a-z0-9%±]+(?:\.[a-z0-9%±]+)*)" : local name (matching group #1) “A-Z”, “a-z”, “0-9” "-%+” and “.” are allowed
// - “([a-z0-9-]+(?:\.[a-z0-9-]+)*(?:\.[a-z]{2,}))” : domain name (matching group #2) “A-Z”, “a-z”, “0-9” and “-.” are allowed
// note : “(?:…)” is a non-matching/capturing group
vt_regex:="(?i)^([_a-z0-9%+-]+(?:\\.[_a-z0-9%+-]+)*)@([a-z0-9-]+(?:\\.[a-z0-9-]+)*(?:\\.[a-z]{2,}))"
//vt_regex:="^([_A-Za-z0-9-]+(?:\\.[_A-Za-z0-9-]+)*)@([A-Za-z0-9-]+(?:\\.[A-Za-z0-9-]+)*(?:\\.[A-Za-z]{2,}))"
//

$vl_start:=1
$vb_emailOk:=Trouver regex($vt_regex;$vt_email;$vl_start;$tl_pos;$tl_length;*)
Si ($vb_emailOk)
$vt_emailLeftSide:=Sous chaîne($vt_email;$tl_pos{1};$tl_length{1})
$vt_smtpServer:=Sous chaîne($vt_email;$tl_pos{2};$tl_length{2})
Fin de si

TABLEAU ENTIER LONG($tl_pos;0)
TABLEAU ENTIER LONG($tl_length;0)

It does split the address and then I use “dig” to check if the domain name is valid (check the MX at the DNS). This way I can catch the “me@gmale.com”, etc…

HTH

Hi,

This pattern works quite well for mail checking :

$vtPattern:="(^[\\w.-]+)@([\\w.-]+\\.[a-zA-Z]{2,6}$)"

This online regex tester is fine too : https://regex101.com/

I use this one:

$address_t:=$1
$pattern_t:="\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,4}\\b"
$0:=Match regex($pattern_t;$address_t;1;$pos_l;$len_l)

♂ = gmale.com
♀ = gfemale.com
they really think about everything…

1 Like

Hi Maurice,

e-mail parsing works fine inside you regex lab, but not in my 4D app.

what is wrong ? I’ve copied the pattern and text from your regex lab inside a new method.

Thanks Armin

You put a asterix * as last parameter of command “Match regex”.
https://doc.4d.com/4Dv18/4D/18/Match-regex.301-4505683.en.html

When you put a asterix *
and $vl_start=4
than your regexPattern get true only with text=“xx my@xy.tv xxxxxxx”
because found email must start at position 4.
This is one job of start-position.
One other job of start-position is to
loop matchRegex and found one email and than the next emails.
WITHOUT asterix * (let them away)
you can do normal search beginning from position founded at any position after start (in a loop it can be the end of last founded position+length+1).

Danke Lutz :grinning:
Your hint solves the problem

Hi Armin,

AJ_Tools_RegexLab is a Test Lab which use our regex.4dbase component.

The Match, Matches, Substitute, Split and Extract member functions are separated from the AJ_Tools_RegexLab test interface to lighten the component. To benefit from these functions, simply includes the 4D regex.4dbase component in your 4D app.

The AJ_Tools_RegexLab includes a code generator to make your life easier :wink: Next to the import button, there is a clipboard. When you click on it, AJ_Tools_RegexLab copies the code to your clipboard, which you can copy directly to your own method. In the example in question, here is the resulting code:

C_OBJECT($regex)
$regex:=New regex 
C_OBJECT($str;$result)
$str:=New object
$str.pattern:="\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,4}\\b"
$str.string:="In this sentence there is an email address info@ajar.ch and another one maurice.inzirillo@ajar.ch which is mine.\\r\\rThis is another e-mail which is more sophisticated info.tech.global@4d.com.uk"
$str.group:=New collection
$str.group:=Split string("";",";sk ignore empty strings)
$result:=$regex.extract($str)


HTH

I think this pattern to be out of date.
Does it validate the address arnaud@demontard.brussels or arnaud@demontard.paris?

What for? I live in Lyon.

At the end you can adjust how many letters the .topleveldomain is alowed to have.
“…{2,4}\b”
“…{min,max}\b”
Specials:
{2,} this mean minimum two chars/letters no maxlimit
Read some documentation about how long can topleveldomain can be and what are the standards.

If you like to have “brussels” :wink:
then write this code for 8 letters maximum:

$pattern_t:="\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,8}\\b"
1 Like