Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

record separator causing problems

by perlperlperl (Novice)
on Dec 19, 2013 at 09:17 UTC ( #1067796=perlquestion: print w/ replies, xml ) Need Help??
perlperlperl has asked for the wisdom of the Perl Monks concerning the following question:

The input file file.xml contains some legal xml with root html tag with content in it. I have set the record separator to undef, so that the entire file is slurped into $records as one big line. In the regex match below, I am trying to get the contents of the html element. This works when I use a literal as the input, but not when I use the variable $records as the input to match, I get 'Use of uninitialized value $text in string at... ' at run time. Why? Am I not capturing the result of the match into $text?
use strict; use warnings; open FILE_IN, '<file.xml'; open FILE_OUT, '>results.txt'; $/ = undef; my $records = <FILE_IN>; my $text = ""; ($text) = $records =~ m/<html>(.*)<\/html>/; print "$text"; close FILE_IN; close FILE_OUT;
Puzzling.

Comment on record separator causing problems
Download Code
Re: record separator causing problems
by hippo (Curate) on Dec 19, 2013 at 09:41 UTC

    You are not using the /s regex modifier, so your regex will not match unless there are no newlines between the opening and closing html tags, which is unlikely (but you haven't shown the data, so we won't know for sure).

Re: record separator causing problems
by Athanasius (Monsignor) on Dec 19, 2013 at 10:32 UTC
      Please for the love of $DEITY, use a XML parsing module. The XML parsing modules (I use XML::Twig) fix so... many... problems... -- Edit: Clarity

        Quite. If only there were ... oh, I don't know ... some sort of poll or something to bring it to everyone's attention.

        :)

Re: record separator causing problems
by sundialsvc4 (Monsignor) on Dec 19, 2013 at 16:08 UTC

    That really can’t be emphasized enough:   “don’t try to do XML without a proper library, be it XML::Twig, XML::LibXML (my personal favorite), or something else.

    In my experience, every XML data-feed that you’re ever going to receive was library-generated ... most commonly with libxml.so (or DLL), which is exactly what is used by the Perl package of the same name.   Everything is there ... XSLT, XPath expressions, and so on.   So you can arrange to be reading the file with the same software that was used to create it, driving the bus with Perl or Python or whatever language you please.   You can focus on what you want to do with the file, and the code required to do it suddenly isn’t complicated at all.   You’ve got much better things to do with your time than monkeying-around with regular expressions and record-separators . . .

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1067796]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2014-09-16 02:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (155 votes), past polls