Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Parsing with RegEx into Array

by mr_p (Scribe)
on Jun 25, 2010 at 17:58 UTC ( #846571=perlquestion: print w/ replies, xml ) Need Help??
mr_p has asked for the wisdom of the Perl Monks concerning the following question:

Hello Everyone,

I would like to parse a all occurrence into an Array. Is it possible to do this? The below Is the code I have.

#!/usr/bin/perl my $str = "<item><i>headline 1</i></item> < item><i>headline2</i></item>"; my @x =~ /<(.+?)>.+<\/\1>/ for $str;

Is there a way to get all 'item' into array?

Thanks.

Comment on Parsing with RegEx into Array
Download Code
Re: Parsing with RegEx into Array
by ikegami (Pope) on Jun 25, 2010 at 18:03 UTC
    =~ says what to match against:
    $str =~ m{<item><i>(.*?)</i></item>}

    You want to match multiple times.

    $str =~ m{<item><i>(.*?)</i></item>}g

    If that is in list context, it will return a list of what was captured.

    my @x = $str =~ m{<item><i>(.*?)</i></item>}g;

    Same thing, but more flexible when you have multiple captures:

    my @x; while ($str =~ m{<item><i>(.*?)</i></item>}g) { push @x, $1; }
      that worked perfectly. but what If <item> is < item> (with a space in it. Is there a way to handle it?

        Are you really asking how to optionally match a space with a regex?

        $str =~ m{<\s*item\s*><i>(.*?)</i></\s*item\s*>}

        They say that time changes things, but you actually have to change them yourself.

        —Andy Warhol

      This is the function that I have. It is not working for some reason. Do you know why?
      sub getItemsFromFile { local $/=undef; open IN_FILE, "< /tmp/.rss_download_file"; my $file_in = <IN_FILE>; close (IN_FILE); #$file_in="<item>headline1</item><item>headline2</item>"; my @allItems=(); #while ($file_in =~ m{<\s*item\s*>(.*?)</\s*item\s*>}g) while ($file_in =~ m{<item>(.*?)</item>}g) { push (@allItems, $1); print "$1\n";; } return @allItems; }

      $file_in prints perfectly. It is there. This is UTF-8 character based file and it has foreign characters. If I uncomment the '#$file_in' line it works. Do u know why? The file is 33k bytes. Is it because I have '\n' In the file?

        Maybe you're expecting "." to match a newline. Add the "s" flag if so.

        This is UTF-8 character based file and it has foreign characters.

        Then you should tell Perl that (meaning you should decode the input) if you want to treat the strings as text (which you seem to).

Re: Parsing with RegEx into Array
by toolic (Chancellor) on Jun 25, 2010 at 18:16 UTC
    I understand you asked for a regex, but this kinda looks like XML, so here's an XML::Twig parser solution:
    use strict; use warnings; use Data::Dumper; use XML::Twig; my $xfile = <<EOF; <foo> <item><i>headline 1</i></item> <item><i>headline2</i></item> </foo> EOF my $t = new XML::Twig(); $t->parse($xfile); my @x; for my $item ($t->root()->children('item')) { push @x, $item->first_child('i')->text(); } print Dumper(\@x); __END__ $VAR1 = [ 'headline 1', 'headline2' ];
      XML::Twig gave me problems loading it. Here is the error
      Parsing of undecoded UTF-8 will give garbage when decoding entities at + /usr/lib/perl5/vendor_perl/5.8.8/XML/Twig.pm line 731

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://846571]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (12)
As of 2014-09-02 09:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (21 votes), past polls