Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Gimmie the Stuff Between Two Keywords

by mumbles (Friar)
on Oct 15, 2001 at 09:21 UTC ( [id://118814]=perlquestion: print w/replies, xml ) Need Help??

mumbles has asked for the wisdom of the Perl Monks concerning the following question:

I have been at this snippet for awhile and now I think I am more confused than before I began.

Given a Win32 system variable pull out all the junk between "XYZ:" and "@mail.cp" and stuff it into another Win32 sytem variable.

How do I extract all text between two keywords like start and end? gives a good read on the problem but I just can't get it to do what I want. Here is the most legible piece I could come up with:
$text = "xcystartsdhjsXYZ:joeblow@mail.cpEwrendrwerwep"; if ($text =~ /\bXYZ:\b(.*?)\b@mail.cp\b/) { $result = $1; # do something with results print $result, $1; }
This prints nothing.
Suggestions?
Thanks in advance

Replies are listed 'Best First'.
Re: Gimmie the Stuff Between Two Keywords
by merlyn (Sage) on Oct 15, 2001 at 09:30 UTC
    $text = "xcystartsdhjsXYZ:joeblow@mail.cpEwrendrwerwep"; if ($text =~ /\bXYZ:\b(.*?)\b@mail.cp\b/) { $result = $1; # do something with results print $result, $1; }
    That prints nothing because there's no /\bXYZ:\b/ in that string! Give it a start/end point that actually occurs, and you'll see you're on the right track.

    -- Randal L. Schwartz, Perl hacker

Re: Gimmie the Stuff Between Two Keywords
by snowcrash (Friar) on Oct 15, 2001 at 09:41 UTC
    Hi!

    I've really no idea how a Win32 system variable looks so I'm missing the context and can't really help you unless you post a formal spec how it can look.
    Anyway, one of the problems in your regex: the word boundary \b will never match between the 's' and 'X' at the start of your regex. Neither will the one at the end of regex. The two other \b's will match as long as (.*?) starts and ends with a word character.
    A \b matches between a \w and a \W i.e. between a word character (a-z,A-Z,0-9,_ and maybe other characters considered alpha numeric by the locale you use if you do so), and a non-word character (all characters that are not in \w).
    Since I don't what kind of charaters should be extracted with (.*?) getting rid of the wrong \b's is all I can tell you right now.

    cheers
    snowcrash
Re: Gimmie the Stuff Between Two Keywords
by pike (Monk) on Oct 15, 2001 at 14:23 UTC
    I think you want:

    $text =~ /XYZ:(.*?)\@mail.cp/

    with a backslash before the @ to avoid string interpolation.

    pike

Re: Gimmie the Stuff Between Two Keywords
by Mosley (Novice) on Oct 15, 2001 at 11:50 UTC
    I use Win32 also and screwed with this for almost an hour before just trying to print the $text variable out. The output was everything minus "@mail", surely it's not going to find @mail.cp. I don't know how you are putting the information into the text variable, I assume you are using LWP::Simple and trying to fetch email address' from the web. I uploaded a page with some text similar to yours and used this script to retrieve it. Try it out, I will leave the page up for awhile. Hope this helps you. Mosley
    use LWP::Simple; print "Content-type: text/html\n\n"; $URL = "http://www.weedlinks.f2s.com/joeblow.html"; $page = get($URL); $page =~ s/\s+/ /g; @text = split(/\s/, $page); foreach $line (@text) { if ($line =~ /\@/) { @line = split(/\@/, $line); print "@line[0]<br>"; # If you just want to collect whole email address'. # Un (#) the next line. #print "$line<br>"; # Or print to a database. Un (#) the next 3 lines. #open (DATABASE,">>joeblow.txt"); #print DATABASE "$line\n"; #close (DATABASE); } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://118814]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2024-03-28 11:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found