Sometimes, we do need to find emails in a large file sized base, easily, from a terminal, without having to use multiple tools.
Linux, offers grep
, a powerful tool that allows to easily find certain specific regex expressions.
Let say for example, need to find emails, based on text analisis in a text file named “emails.txt”, the code will be as follows:
grep -P '^[\w._]{5,30}(+[\w]{0,10})?@[\w.-]{3,}?.\w{2,5}$' emails.txt\
Explanation:
^
: start of a line.[\w\._]{5,30}
: matches a word-like pattern of 5 to 30 characters (allows alphanumeric characters, underscores, and dots)(\+[\w]{0,10})?
: optional part, it matches a plus sign (+) and then 0 to 10 word characters (parentheses and the question mark make it optional)@
: is the regular “@” character in email addresses.[\w\.\-]{3,}?
: matches the domain name in the email (allows word characters, dots, and hyphens, the domain name should be at least 3 characters) the question mark makes it a non-“greedy match”, meaning it will match the shortest possible domain name.\.w{2,5}
: matches the top-level domain (TLD) such as “.com”, “.org”, or “.edu” (allows 2 to 5 word characters).$
: end of a line.
Cheers!