Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Parsing record into hash

by sxmwb (Pilgrim)
on Jul 04, 2006 at 20:07 UTC ( #559217=perlquestion: print w/replies, xml ) Need Help??

sxmwb has asked for the wisdom of the Perl Monks concerning the following question:

hello all, I have are file with a record format that looks like the one below and I would like to process each record into a hash so that I can extract specific data. Being new to record parsing, especially something like this is interesting. Any guidance would be appricated, I have looked at tutorials and FAQs. The first record would be nice to parse for basic information but there are multiple records that end with <EOR> that I really need in the hash.

Thanks
Mike - lowly initiate seeking knowledge and wisdom

Exported by jLog (c)2006 LA3HM, V 3.90.2.7 according to ADIF <adif_ver +:1>2 <PROGRAMID:4>jLog For jLog info: mailto:mail@jlog.org http://jlog.org/ Proposed ADIF2 Extensions may be included <eoh> <qso_date:8:d>20051029 <time_on:6>213400 <call:4>VC3O <band:3>20M <mode:3>SSB <operator:3>VHF <rst_sent:2>59 <rst_rcvd:2>59 <dxcc:1>1 <stx:1>4 <srx:1>4 <ituz:1>4 <cqz:1>4 <pfx:3>VC3 <con +t:2>NA <freq:2>14 <qsoComplete:1> <app_jlog_qso_number:4>0001 <a +pp_jlog_eqsl_qsl_sent:1>Y <app_jlog_eqsl_qsl_rcvd:1>Y <app_jlog_lot +w_qsl_sent:1>Y <qsl_sent_via:1>E <eor> <qso_date:8:d>20060701 <time_on:6>183206 <call:5>VE6GG <band:3>20M <mode:3>SSB <operator:3>MWB <rst_sent:2>59 <rst_rcvd:2>59 <dxcc:1>1 <stx:2>27 <srx:2>AB <ituz:1>2 <cqz:1>4 <contest_id:3> +RAC <pfx:3>VE6 <cont:2>NA <freq:8>14.16299 <state:2>AB <qsoCompl +ete:1> <app_jlog_qso_number:4>1257 <app_jlog_eqsl_qsl_sent:1>Y < +app_jlog_eqsl_qslsdate:10>2006-07-01 <app_jlog_lotw_qsl_sent:1>Y < +app_jlog_lotw_qslsdate:10>2006-07-01 <operator:4>N7DQ <eor>

Code tags added by GrandFather

Replies are listed 'Best First'.
Re: Parsing record into hash
by GrandFather (Sage) on Jul 04, 2006 at 21:37 UTC

    Odd file format. Looks like someone saw XML, but didn't get the point. The following constructs a hash of hashes containing QSO records and the header record.

    use strict; use warnings; use Data::Dump::Streamer; my %QSOs; $QSOs{header} = ''; while (<DATA>) { $QSOs{header} .= $_; last if /<eoh>/; } my %qso; my $key = ''; while (defined (my $line = <DATA>)) { chomp $line; next if ! length $line; next if ! ($line=~ /<qso_date:/ or length $key); if (! length $key) { $line=~ s/<qso_date:[^>]*>([^<]*)<time_on:[^>]*>([^<]*)(<?)/$3 +/; $key = "$1:$2"; } my @fields = split '<', $line; for (@fields) { my ($tag, $text) = /([^>]*)>(.*)/; next if ! defined $tag or ! length $tag; if ($tag eq 'eor') { $QSOs{$key} = {%qso} if length $key; $key = ''; %qso = (); last; } $qso{$tag} = $text || ''; } } Dump (\%QSOs); __DATA__ Exported by jLog (c)2006 LA3HM, V 3.90.2.7 according to ADIF <adif_ver +:1>2 <PROGRAMID:4>jLog For jLog info: mailto:mail@jlog.org http://jlog.org/ Proposed ADIF2 Extensions may be included <eoh> <qso_date:8:d>20051029 <time_on:6>213400 <call:4>VC3O <band:3>20M <mode:3>SSB <operator:3>VHF <rst_sent:2>59 <rst_rcvd:2>59 <dxcc:1>1 <stx:1>4 <srx:1>4 <ituz:1>4 <cqz:1>4 <pfx:3>VC3 <con +t:2>NA <freq:2>14 <qsoComplete:1> <app_jlog_qso_number:4>0001 <app_jlog_eqsl_qsl_sent:1>Y <app_jlog_eqsl_qsl_rcvd:1>Y <app_jlog_lotw_qsl_sent:1>Y <qsl_sent_via:1>E <eor> <qso_date:8:d>20060701 <time_on:6>183206 <call:5>VE6GG <band:3>20M <mode:3>SSB <operator:3>MWB <rst_sent:2>59 <rst_rcvd:2>59 <dxcc:1>1 <stx:2>27 <srx:2>AB <ituz:1>2 <cqz:1>4 <contest_id:3> +RAC <pfx:3>VE6 <cont:2>NA <freq:8>14.16299 <state:2>AB <qsoComplete: +1> <app_jlog_qso_number:4>1257 <app_jlog_eqsl_qsl_sent:1>Y <app_jlog_eqsl_qslsdate:10>2006-07-01 <app_jlog_lotw_qsl_sent:1>Y <app_jlog_lotw_qslsdate:10>2006-07-01 <operator:4>N7DQ <eor>

    DWIM is Perl's answer to Gödel
      Thank you, the data is an Amateur Radio Data interchange format. I think you are right that they saw XML and did not really know what was going on. Your solution is simple and now to understand it. So far two answers in two hours is great and two different ways of doing it.

      Thanks Mike

        I think you are right that they saw XML and did not really know what was going on.

        Actually, the format makes more sense than you and GrandFather give it credit. The general field format is:

        '<' identifier ':' length '>' value

        There also looks to be a date, that's marked 'd' , after the length (in field qso_date).

        I'd much rather have to parse this format than something like METAR, where you have to make guesses about the fields you're processing based on their order and format.

Re: Parsing record into hash
by Hue-Bond (Priest) on Jul 04, 2006 at 21:06 UTC

    Your question isn't very clear but let's see what do you think about this:

    open my $fd, '<', 'foo' or die open "$!\n"; my ($h, @arr); ## assuming <eor> and <eoh> are always at the end of their lines while (<$fd>) { $h .= $_; ## We store header in $h. I won't parse it. last if /<eoh>$/; } my $c = 0; while (<$fd>) { next if /^\n$/; my @fields = split / /, $_; foreach (@fields) { my ($k, $v) = $_ =~ /^(<[^>]+>)(.*)$/; next if $k eq '<eor>'; $arr[$c]{$k} = $v; } $c++ if /<eor>$/; } close $fd;

    Output of use Data::Dumper;print Dumper \@arr;

    Update: Removed an intermediate step and did some other minor tweaks. Now it's smaller and probably faster.

    --
    David Serrano

      Wow, Thank you, this what I am looking for the info between the <> is the key and the data follows. You make it look simple. Now I have to read through the code to really understand what just happened.

      Thanks Mike

Re: Parsing record into hash
by TedPride (Priest) on Jul 04, 2006 at 21:52 UTC
    You didn't say what the record key for each record is, so I'm just adding the records to an overall array. I'm also not sure what the numbers inside the tags are for, so the following assumes that each tag is a key with no modifications.
    use strict; use warnings; use Data::Dumper; my ($temp, @results) = ''; ### Fast forward past header while (<DATA>) { last if m/<eoh>\s+$/; } ### While there are records remaining... while (<DATA>) { $temp .= $_; ### Process if end of record tag reached if (m/<eor>\s+$/) { my %hash; $temp =~ s/\n//g; $temp =~ s/<eoh>.*//; $hash{$1} = $2 while $temp =~ /<(.*?)>([^<]*)\s/sg; push @results, \%hash; $temp = ''; } } print Dumper(\@results); __DATA__ Exported by jLog (c)2006 LA3HM, V 3.90.2.7 according to ADIF <adif_ver +:1>2 <PROGRAMID:4>jLog For jLog info: mailto:mail@jlog.org http://jlo +g.org/ Proposed ADIF2 Extensions may be included <eoh> <qso_date:8:d>20051029 <time_on:6>213400 <call:4>VC3O <band:3>20M <mod +e:3>SSB <operator:3>VHF <rst_sent:2>59 <rst_rcvd:2>59 <dxcc:1>1 <stx:1>4 <srx:1>4 <ituz:1>4 <cqz:1>4 <pfx:3>VC3 <cont:2>NA < +freq:2>14 <qsoComplete:1> <app_jlog_qso_number:4>0001 <app_jlog_eqsl_ +qsl_sent:1>Y <app_jlog_eqsl_qsl_rcvd:1>Y <app_jlog_lotw_qsl_sent:1>Y +<qsl_sent_via:1>E <eor> <qso_date:8:d>20060701 <time_on:6>183206 <call:5>VE6GG <band:3>20M <mo +de:3>SSB <operator:3>MWB <rst_sent:2>59 <rst_rcvd:2>59 <dxcc:1>1 <stx:2>27 <srx:2>AB <ituz:1>2 <cqz:1>4 <contest_id:3>RAC <pf +x:3>VE6 <cont:2>NA <freq:8>14.16299 <state:2>AB <qsoComplete:1> <app_ +jlog_qso_number:4>1257 <app_jlog_eqsl_qsl_sent:1>Y <app_jlog_eqsl_qsl +sdate:10>2006-07-01 <app_jlog_lotw_qsl_sent:1>Y <app_jlog_lotw_qslsda +te:10>2006-07-01 <operator:4>N7DQ <eor>
      Ted,

      Thanks yet another way to do it. I love that about Perl. I appreciate the response and now have to read and understand what you have provided.

      Thanks Mike

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://559217]
Approved by willyyam
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2019-10-16 11:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?