Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^5: Multiline CSV and XML

by rowdog (Curate)
on Oct 02, 2010 at 01:31 UTC ( [id://863031]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Multiline CSV and XML
in thread Multiline CSV and XML

The first thing I want to point out is that splitting on commas is not enough to parse CSV. Your parser will break on data that includes a comma: Bring your water, beer, and trash bag. That alone is enough of a reason for me to say, use Text::CSV_PP instead so you get a real parser.

Your XML generator is fragile and does lots of unneccesary work. You can see that it's fragile because you're already going through a lot of pain trying to change your generator when the requirements change. I have to second dHarry and recommend you use XML::Simple instead. Even when the hash structure changes, generating output can be as simple as print XMLout($hash_ref);

The bulk of the unneccesary work in the XML generator is that you assign a variable for each of the values in the hash when there's no reason to do so. Just use the hash directly...

my $gXtext = <<"GXEOF"; <?xml version="1.0" encoding="UTF-8"?> <package xmlns="http://greateventbulatine/event/organizer" xmlns:xsi=" +http://www +.w3.org/2001/XMLSchema-instance"> <theme>${$gXHHRef}{'picnic theme'}</theme>

For escaping HTML entities, you might want HTML::Entities.

Now, if, after all that, you still want fragile code that's hard to maintain, well, you've already spelled out what you want to do, so do it. You have to store the last header and add a conditional that loops through the file until your conditions are met. Assuming that each record is in a seperate file, I suppose I'd go at it something like this...

use strict; use warnings; use XML::Simple; my @headers = parseCVSLine(<CSV>); my @record = parseCVSLine(<CSV>); my @extra_records; while ( my $line = <CSV> ) { my @xrec = parseCVSline(<CSV>); push @extra_records, \@xrec if defined } my %data; @data{@headers} = @record; # but I still don't know what to do with those extra records so ... print XMLout(\%data); # yet another lame CSV "parser", use Text::CSV sub parseCVSLine { return $_[0] ? split /,/ : undef; }

I still don't know what the boundary between records is. Normally, you would expect a line terminator but with multiline records, you need to use something other than the typical line terminator. In the code above, I assumed that the boundary was the file but perhaps that's wrong. In any case, you really ought to use a clear record boundary.

I really think you should use the modules so you don't end up where you are now: with a collection of scripts that all have fragile parsers and you have to go tweak every script every time there's a minor change to the data format. Like, what happens when people want a link to the map to get to the picnic? Because you've locked yourself into this format, you'll have to tweak every file that touches that data. Try to take a more fluid approach where you can. My lame example script above doesn't care at all what data is in the files and will still just work.

Above all, my number one reason to recommend the modules is, you could have been done by now and you would have something that's stable, robust, easy to read and, therefore, easy to maintain.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://863031]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-03-29 13:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found