Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Parsing with RegEx into Array

by mr_p (Scribe)
on Jun 25, 2010 at 17:58 UTC ( #846571=perlquestion: print w/replies, xml ) Need Help??
mr_p has asked for the wisdom of the Perl Monks concerning the following question:

Hello Everyone,

I would like to parse a all occurrence into an Array. Is it possible to do this? The below Is the code I have.

#!/usr/bin/perl my $str = "<item><i>headline 1</i></item> < item><i>headline2</i></item>"; my @x =~ /<(.+?)>.+<\/\1>/ for $str;

Is there a way to get all 'item' into array?


Replies are listed 'Best First'.
Re: Parsing with RegEx into Array
by ikegami (Pope) on Jun 25, 2010 at 18:03 UTC
    =~ says what to match against:
    $str =~ m{<item><i>(.*?)</i></item>}

    You want to match multiple times.

    $str =~ m{<item><i>(.*?)</i></item>}g

    If that is in list context, it will return a list of what was captured.

    my @x = $str =~ m{<item><i>(.*?)</i></item>}g;

    Same thing, but more flexible when you have multiple captures:

    my @x; while ($str =~ m{<item><i>(.*?)</i></item>}g) { push @x, $1; }
      This is the function that I have. It is not working for some reason. Do you know why?
      sub getItemsFromFile { local $/=undef; open IN_FILE, "< /tmp/.rss_download_file"; my $file_in = <IN_FILE>; close (IN_FILE); #$file_in="<item>headline1</item><item>headline2</item>"; my @allItems=(); #while ($file_in =~ m{<\s*item\s*>(.*?)</\s*item\s*>}g) while ($file_in =~ m{<item>(.*?)</item>}g) { push (@allItems, $1); print "$1\n";; } return @allItems; }

      $file_in prints perfectly. It is there. This is UTF-8 character based file and it has foreign characters. If I uncomment the '#$file_in' line it works. Do u know why? The file is 33k bytes. Is it because I have '\n' In the file?

        Maybe you're expecting "." to match a newline. Add the "s" flag if so.

        This is UTF-8 character based file and it has foreign characters.

        Then you should tell Perl that (meaning you should decode the input) if you want to treat the strings as text (which you seem to).

      that worked perfectly. but what If <item> is < item> (with a space in it. Is there a way to handle it?

        Are you really asking how to optionally match a space with a regex?

        $str =~ m{<\s*item\s*><i>(.*?)</i></\s*item\s*>}

        They say that time changes things, but you actually have to change them yourself.

        —Andy Warhol

Re: Parsing with RegEx into Array
by toolic (Bishop) on Jun 25, 2010 at 18:16 UTC
    I understand you asked for a regex, but this kinda looks like XML, so here's an XML::Twig parser solution:
    use strict; use warnings; use Data::Dumper; use XML::Twig; my $xfile = <<EOF; <foo> <item><i>headline 1</i></item> <item><i>headline2</i></item> </foo> EOF my $t = new XML::Twig(); $t->parse($xfile); my @x; for my $item ($t->root()->children('item')) { push @x, $item->first_child('i')->text(); } print Dumper(\@x); __END__ $VAR1 = [ 'headline 1', 'headline2' ];
      XML::Twig gave me problems loading it. Here is the error
      Parsing of undecoded UTF-8 will give garbage when decoding entities at + /usr/lib/perl5/vendor_perl/5.8.8/XML/ line 731

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://846571]
Approved by ikegami
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2018-06-19 01:40 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (111 votes). Check out past polls.