Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re^2: Parsing with RegEx into Array

by mr_p (Scribe)
on Jun 25, 2010 at 20:47 UTC ( #846598=note: print w/replies, xml ) Need Help??

in reply to Re: Parsing with RegEx into Array
in thread Parsing with RegEx into Array

This is the function that I have. It is not working for some reason. Do you know why?
sub getItemsFromFile { local $/=undef; open IN_FILE, "< /tmp/.rss_download_file"; my $file_in = <IN_FILE>; close (IN_FILE); #$file_in="<item>headline1</item><item>headline2</item>"; my @allItems=(); #while ($file_in =~ m{<\s*item\s*>(.*?)</\s*item\s*>}g) while ($file_in =~ m{<item>(.*?)</item>}g) { push (@allItems, $1); print "$1\n";; } return @allItems; }

$file_in prints perfectly. It is there. This is UTF-8 character based file and it has foreign characters. If I uncomment the '#$file_in' line it works. Do u know why? The file is 33k bytes. Is it because I have '\n' In the file?

Replies are listed 'Best First'.
Re^3: Parsing with RegEx into Array
by ikegami (Pope) on Jun 25, 2010 at 21:35 UTC

    Maybe you're expecting "." to match a newline. Add the "s" flag if so.

    This is UTF-8 character based file and it has foreign characters.

    Then you should tell Perl that (meaning you should decode the input) if you want to treat the strings as text (which you seem to).

      What does the 's' mean? Can you point me to any documentation on this if u know of?

      I am also unable to print the utf8. Below is my code.

      my $curLink1 = utf8::decode($curLink); # Use UNICODE semantics my $item1 = utf8::decode($item); my $fileName="/tmp/out_file.html"; use open OUT => ':utf8'; open OUT_FILE, "> $fileName"; print OUT_FILE "<item>$item1</item>"; close OUT_FILE; #open (my $fh, '>:encoding (UTF-8)', $fileName); #print $fh "<item>$item1</item>"; #close $fh;

      I tried the commented code too. I also tried to print $item, which is encoded. $item or $item1 does not print in file, but it does print on STDOUT.

      Thanks for you help.

        For the match operator. "/s" causes "." to match any byte/character. Without it, "." matches any byte/character except 0x0A/newline. Operators are documented in perlop. There's probably more info perlre.

        open(my $fh_in, '<:encoding(UTF-8)', ...) or die ...; ... my @allItems = $file_in =~ m{<item>(.*?)</item>}sg; ... open(my $fh_out, '>:encoding(UTF-8)', ...) or die ...; print $fh_out ...;

        You never check whether opening the output file succeeded. See open. For your original query about regular expressions, see perlre and maybe perlretut.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://846598]
and all is calm...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2017-09-25 20:03 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (289 votes). Check out past polls.