Re: grep question using multiple lines
by borisz (Canon) on Dec 28, 2008 at 00:34 UTC
|
use a proper email parser:
My example parse all emails out of the text and then print only the email addresses, that start with constant=
use Email::Address;
my @add = Email::Address->parse(<<'__TXT__');
f834bkg94halUF9deju hHFDUO()NFRS432 DSFadsfg94hHFDUO()N
hfedls74d8oHFx constant=barney@gmail.com alUF9dejuH()NF
UO()NFRS432 DSFadsf4halUF9deju fedls74d8oH sfg94hHFDUOf
f834bkg94halUF9deju hHFDUO()NFRS432 DSFadsfg94hHFDUO()N
hfedls74d8oHFx constant=wilma@aol.com alUF9dejuH()NFui0
UO()NFRS432 DSFadsf4halUF9deju fedls74d8oH sfg94hHFDUOf
__TXT__
for my $add (@add) {
local $_ = $add->address;
next unless s/^constant=//;
print $_, $/;
}
output:
barney@gmail.com
wilma@aol.com
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Thanks, but like I so carefully pointed out, I'm not just looking for email addresses. I'm aware of all the parsing modules out there. This is just an exercise in grepping and looking for an academic answer.
—Brad "The important work of moving the world forward does not wait to be done by perfect men." George Eliot
| [reply] [Watch: Dir/Any] |
|
$Email::Address::addr_spec
This regular expression defined what an email address is allowed to look like.
(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>
+[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism
+:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))
+*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x0
+0-\x1F\x7F()<>\[\]:;@\\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(
+?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:
+\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\
+s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xis
+m:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\(
+(?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)
+)*\s*\)\s*)))*\s*\)\s*)|\s+)*"(?-xism:(?-xism:[^\\"])|(?-xism:\\(?-xi
+sm:[^\x0A\x0D])))+"(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()
+\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-
+xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*
+\)\s*)|\s+)*))\@(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?
+-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\
+s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s
+*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(
+?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*
+(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism
+:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0
+D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:
+\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-x
+ism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A
+\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*\[(?:\s*(?-xism:(?-xism:[^\[\]\
+\])|(?-xism:\\(?-xism:[^\x0A\x0D]))))*\s*\](?-xism:(?-xism:\s*\((?:\s
+*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xis
+m:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x
+0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)))
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: grep question using multiple lines
by backstab (Novice) on Dec 28, 2008 at 02:01 UTC
|
my $txt = <<'EOF';
f834bkg94halUF9deju hHFDUO()NFRS432 DSFadsfg94hHFDUO()N
hfedls74d8oHFx constant=barney@gmail.com alUF9dejuH()NF
UO()NFRS432 DSFadsf4halUF9deju fedls74d8oH sfg94hHFDUOf
f834bkg94halUF9deju hHFDUO()NFRS432 DSFadsfg94hHFDUO()N
hfedls74d8oHFx constant=wilma@aol.com alUF9dejuH()NFui0
UO()NFRS432 DSFadsf4halUF9deju fedls74d8oH sfg94hHFDUOf
EOF
while ($txt =~ /constant=(\w+@\w+\.\w+)/g) {
print "$1\n";
}
Prints what you want for me. To understand what it
does note the /g flag used to match that allows to
not reset the match at the beginning from call to
call as a result of the while loop.
| [reply] [Watch: Dir/Any] [d/l] |
Re: grep question using multiple lines
by eye (Chaplain) on Dec 28, 2008 at 03:05 UTC
|
This seems to work, though it assumes that there is no more than one address per line:
^.*constant=(\w+@\w+\.\w+).*|^.*\r
replaced by "\1". You can use "Replace All" and get the result you want. If you are willing to accept a multi-step solution, you can make this more robust and easily eliminate the assumption of no more than one address per line.
In my usage, I'd be inclined to use a regex in "Process Lines Containing..." to eliminate lines without an email address. I'd then extract the email addresses from the remaining lines with a regex. I believe all of this could be automated in a BBEdit Text Factory.
| [reply] [Watch: Dir/Any] [d/l] |
|
My solution with the while loop works with many emails
on the same line. In fact doing so we consider the text
as a whole totally ignoring newlines.
The idea of /g flag within a while is each match will
start where the previous one has stopped and the loop
stops when there is no more successful match.
The special variable @- is an array with
the match start and end positions respectivly as
$-[0] and $-[1] it might
help to see what the loop does,
while ($txt =~ /constant=(\w+@\w+\.\w+)/g) {
print "==> match starts at $-[0]!!!\n";
print "$1\n";
}
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
bradcathey (the OP) wrote:
I'm trying to isolate some code in BBEdit using the grep functionality offered (Perl friendly).
By my reading, the OP wants to know how to use the PCRE capability of BBEdit (or TextWrangler) to accomplish this task. While BBEdit has a mechanism for invoking scripts, I do not think that was what the OP was asking about. There are many merits to your answer, but it is not something that can be implemented directly in BBEdit.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: grep question using multiple lines
by n3toy (Hermit) on Dec 28, 2008 at 03:30 UTC
|
You might be able to do it in one line.
I am not sure what the criteria for finding the data is exactly. The example shows an email address, but you say the search text could be anything. Assuming per the example you are looking for the text following "constant=" up to the first space and there is only one instance per line, this worked for me:
perl -nle 'while(m/constant=(.*)\s/g){print "$1"}' /home/jamie/example
+.txt
I tend to oversimplify things, so it may not be what you are looking for. But it is one line and it returns the data you were looking for.
Jamie
| [reply] [Watch: Dir/Any] [d/l] |
|
perl -nle 'print $1 while /constant=(.*)\s/g' /home/jamie/txt
But I remark the association of -l and
\s vs. a more explicite regexp
does not behave well in case of many matches on the same
line!
Try it for example with a txt file as follow,
xxxxxxxxx constant=foo@bar.com xxxxxxxxxxx
xxxxxx constant=baz@huux.org xxxxxxx contant=hello@world.bye xxxxxxxxx
xxxxxxxxxxxxxxxxxxx
will print,
foo@bar.com
baz@huux.org xxxxxxx contant=hello@world.bye
I think the problem comes from (.*)
that is greedy and matches even spaces at the condition
there is at least one space remaining to satisfie
\s. But I try (.*?) and
the /g flag does not seem ok? | [reply] [Watch: Dir/Any] [d/l] [select] |
|
$ perl -nle 'print $1 while /constant=(^ +)\s/g' < test.txt
barney@gmail.com
wilma@aol.com
| [reply] [Watch: Dir/Any] |