Sometimes, we do need to find emails in a large file sized base, easily, from a terminal, without having to use multiple tools.

Linux, offers grep, a powerful tool that allows to easily find certain specific regex expressions.

Let say for example, need to find emails, based on text analisis in a text file named “emails.txt”, the code will be as follows:

grep -P '^[\w._]{5,30}(+[\w]{0,10})?@[\w.-]{3,}?.\w{2,5}$' emails.txt\

Explanation:

  1. ^: start of a line.
  2. [\w\._]{5,30}: matches a word-like pattern of 5 to 30 characters (allows alphanumeric characters, underscores, and dots)
  3. (\+[\w]{0,10})?: optional part, it matches a plus sign (+) and then 0 to 10 word characters (parentheses and the question mark make it optional)
  4. @: is the regular “@” character in email addresses.
  5. [\w\.\-]{3,}?: matches the domain name in the email (allows word characters, dots, and hyphens, the domain name should be at least 3 characters) the question mark makes it a non-“greedy match”, meaning it will match the shortest possible domain name.
  6. \.w{2,5}: matches the top-level domain (TLD) such as “.com”, “.org”, or “.edu” (allows 2 to 5 word characters).
  7. $: end of a line.

Cheers!

By davs

Leave a Reply