perl -F',' -lane '@F = map { s/^.*<(.*?)>.*/$1/r } @F;
print join(", ", @F)' input.txt
[email protected], [email protected], [email protected]
[email protected]
fifth.example.com, sixth.example.com
This auto-splits the input on commas and stores it in array @F. Then it use map() to run a regex over each element of @F to remove everything outside of angle-brackets < and >
It just assumes that everything inside angle brackets is a valid email address. Much of the time this is a reasonable assumption because that's how email addresses are supposed to be bounded.
This is not always true with real world email addresses and a better version would use Regexp::Common::Email::Address to validate each email address before accepting and printing it.
Here's a better version, written as a standalone script rather than a one-liner because it's easier to read and understand that way:
#!/usr/bin/perl
use strict;
use feature 'say';
use Regexp::Common qw(Email::Address);
while(<<>>) {
next if m/^\s*$/; # ignore empty lines
chomp; # strip trailing \n
my @addr = (); # reset @addr array for each input line
foreach (split /\s*,\s*/) { # split on commas with optional whitespace
next unless m/\@/; # ignore elements without an @
s/^.*<(.*?\@.*?)>.*/$1/; # strip everything not inside < >
# Add to @addr array if it's a valid address
push @addr, $_ if m/$RE{Email}{Address}/;
};
# print the addresses found on the current line, if any.
say join(", ", @addr) if @addr;
};
Save that as, e.g., extract-addresses.pl and make it executable with chmod +x extract-addresses.pl. Then run it as extract-addresses.pl filename or pipe your data into it (it works with either or both).
For example, with an input file (input.txt) that contains this:
First Person <[email protected]>, Second Person <[email protected]>, Third <[email protected]>
Fourth Person Long Name <[email protected]>
Fifth <fifth.example.com>, Sixth Person <sixth.example.com>
bad<user> <[email protected]>, another<<<bad<<<user>>> xyz
<"<ohdear>first"@example.com>, Last, First <[email protected]>
It produces this output:
$ ./extract-addresses.pl input.txt
[email protected], [email protected], [email protected]
[email protected]
[email protected]
[email protected]
Technically, another<<<bad<<<user>>> xyz is a valid email address even though it doesn't have an @ or a FQDN (Fully Qualified Domain Name, i.e. a hostname or domain name). It could be OK as an alias or in a virtual domain, but is probably not a valid user account name but even that is possible, depending on the kind of server. Anyway, this script is written to ignore addresses without an @ symbol in them.
And, as @StephenKitt points out, <"<ohdear>first"@example.com> is also a valid email address. This script does not handle pathological cases like that - as a judgemental programmer, I have arbitrarily decided that whoever owns that address does not deserve to receive email :-) and more importantly, as the great copout says, handling addresses like that is "left as an exercise for the reader". It probably wouldn't be too hard, but I haven't even finished my first coffee of the day yet.
fifth.example.comandsixth.example.comare not email addresses, they're domain names. And it would be wrong to just change the first.to an@because that would break on things like<last.first.example.com>([email protected]). and changing the second last.to@would also be wrong because it would break on domains likefifth.example.com.au([email protected]). These would be valid addresses, but probably not addresses belonging to the users in the input. Anyway, the point is that any such assumption you make is likely to be wrong in at least some cases.