Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Keep It Simple, Stupid
 
PerlMonks  

Stripping off the first line of file help!

by Anonymous Monk
on Mar 02, 2011 at 13:38 UTC ( #890983=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi there Monks!
I am working on this XML file but my problem is that I need to find a way to strip off the first line on the XML file before using XML::XPATH to process it, because it is giving me an error, without this line it works:
encoding specified in XML declaration is incorrect at line 1, column 3 +0, byte 30: <?xml version="1.0" encoding="utf-16"?>
I have no choice but remove or ignore this line. Is there a way to tell XML::XPATH to ignore this first line
<?xml version="1.0" encoding="utf-16"?>
and start after that? If not, how would you approach this situation to get rid of this first line and process the rest of file?
Here is the code
#!/usr/bin/perl use strict; use warnings; use XML::XPath; my $xp = XML::XPath->new(filename => '/xml_test.xml'); foreach my $row ($xp->findnodes('/HOME/Account')) { my $nodeset = $row->find('Record'); foreach my $node ( $nodeset->get_nodelist ) { my $account = $node->find( 'accnumber')->string_value; print "\n $account \n"; } }
Thnaks for looking!

Comment on Stripping off the first line of file help!
Select or Download Code
Re: Stripping off the first line of file help!
by roboticus (Canon) on Mar 02, 2011 at 14:40 UTC

    Perhaps the better solution is to ask the provider to give you a correct XML file? If you get into the habit of trying to process incorrect XML files, it can lead you down a nasty rabbit hole. I generally give a bug report to the vendor and point them to an online XML validator so they can fix it.

    ...roboticus

    When your only tool is XML, all your problems are just beginning.

Re: Stripping off the first line of file help!
by Sinistral (Prior) on Mar 02, 2011 at 14:41 UTC

    If your data is really encoded in UTF-16, not UTF-8 and you have characters with a code point above 127, you're going to be in a world of hurt. I also want to point out that XML::XPath has had it's last update in 2003, and still has several outstanding bugs in it (I know, I did some fixes to overcome a couple in my own processing, and even that was years ago). For XPath processing, it is HIGHLY recommended to use XML::LibXML which handles all the character encodings properly so that you don't have to mangle your input XML. Here's a PerlMonks intro to get you started: Stepping up from XML::Simple to XML::LibXML

Re: Stripping off the first line of file help!
by SimonClinch (Chaplain) on Mar 02, 2011 at 17:05 UTC
    <? blah blah ?> is a standard XML header format. So it's the XML::XPATH module that is at fault.

    But the module will accept a scalar instead of a file of XML to process. So you could slurp the file into a variable and then: Because the header appears only once, a workaround substitution like:

    s/\<\?.*\?\>//;
    should be sufficient.

    I would tend not to remove the first line, because carriage control is just noise to XML and shouldn't be part of any parsing algorithm - even a simple parsing helper as in this case.

    One world, one people

      OK, I can do this but how would I use XML::XPATH after slurping the file into a variable? Here is my situation: Its been treated as text, I rather treat the file like XML for validation. Any suggestions?
      my @memberstocheck = ('12345', '88766', '887766', '009888', '111233', +'99877'); # Check XML file. my @xml = $zip->membersMatching( '.*\.XML' ); foreach (@xml) { # Slurp file my $contents = $_->contents(); open my $contents_fh, '<', \$contents or die "Can't open scalar filehandle: $!"; my $first_line = <$contents_fh>; while (<$contents_fh>) { chomp; if(/<accnumber>(.*?)<\/accnumber>/gi) { my $acc = $1; foreach (@memberstocheck) { if (/^($acc)/g) { print "$acc - ".$_."\n"; } } } } }
Re: Stripping off the first line of file help!
by fidesachates (Monk) on Mar 02, 2011 at 17:26 UTC
    As my fellow monks have said, it's the module at fault here. I use XML::Simple

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://890983]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (10)
As of 2014-04-17 14:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (449 votes), past polls