Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: Parsing with RegEx into Array

by mr_p (Scribe)
on Jun 25, 2010 at 20:47 UTC ( #846598=note: print w/ replies, xml ) Need Help??


in reply to Re: Parsing with RegEx into Array
in thread Parsing with RegEx into Array

This is the function that I have. It is not working for some reason. Do you know why?

sub getItemsFromFile { local $/=undef; open IN_FILE, "< /tmp/.rss_download_file"; my $file_in = <IN_FILE>; close (IN_FILE); #$file_in="<item>headline1</item><item>headline2</item>"; my @allItems=(); #while ($file_in =~ m{<\s*item\s*>(.*?)</\s*item\s*>}g) while ($file_in =~ m{<item>(.*?)</item>}g) { push (@allItems, $1); print "$1\n";; } return @allItems; }

$file_in prints perfectly. It is there. This is UTF-8 character based file and it has foreign characters. If I uncomment the '#$file_in' line it works. Do u know why? The file is 33k bytes. Is it because I have '\n' In the file?


Comment on Re^2: Parsing with RegEx into Array
Download Code
Re^3: Parsing with RegEx into Array
by ikegami (Pope) on Jun 25, 2010 at 21:35 UTC

    Maybe you're expecting "." to match a newline. Add the "s" flag if so.

    This is UTF-8 character based file and it has foreign characters.

    Then you should tell Perl that (meaning you should decode the input) if you want to treat the strings as text (which you seem to).

      What does the 's' mean? Can you point me to any documentation on this if u know of?

      I am also unable to print the utf8. Below is my code.

      my $curLink1 = utf8::decode($curLink); # Use UNICODE semantics my $item1 = utf8::decode($item); my $fileName="/tmp/out_file.html"; use open OUT => ':utf8'; open OUT_FILE, "> $fileName"; print OUT_FILE "<item>$item1</item>"; close OUT_FILE; #open (my $fh, '>:encoding (UTF-8)', $fileName); #print $fh "<item>$item1</item>"; #close $fh;

      I tried the commented code too. I also tried to print $item, which is encoded. $item or $item1 does not print in file, but it does print on STDOUT.

      Thanks for you help.

        You never check whether opening the output file succeeded. See open. For your original query about regular expressions, see perlre and maybe perlretut.

        For the match operator. "/s" causes "." to match any byte/character. Without it, "." matches any byte/character except 0x0A/newline. Operators are documented in perlop. There's probably more info perlre.

        open(my $fh_in, '<:encoding(UTF-8)', ...) or die ...; ... my @allItems = $file_in =~ m{<item>(.*?)</item>}sg; ... open(my $fh_out, '>:encoding(UTF-8)', ...) or die ...; print $fh_out ...;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://846598]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2014-08-22 15:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (159 votes), past polls