Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Good points.

but what if the final line is supposed to be processed by some other piece of code? You can't just ungetc a readline...

You are correct in that there is no "unget" or "un-read" for a line that has already been read. There are various ways of handling that sort of situation. In the case where the process() sub needs to deal with the first line, I pass that first line as a parameter to the process() sub. Usually these sorts of things are record oriented.... something has to be done with a record that was read and the process() sub's job is to assemble a complete record. If you want the code that "does something to the record" to be in the main driver, then just have process() return a structure or modify a struct ref that is passed in. I don't see any issue here at all. Can't use Perl's single action "if" in that situation, but I don't see any issue.

Also, note that your process_record is making use of a global variable, DATA, and three of your four examples will throw an undef warning if the end-of-file is reached before the closing line is seen.

As far as global DATA goes, I have no issue with that for a short (<1 page) piece of code. In a larger program I would pass a lexical file handle to the sub. Note: You can make a lexical file handle out of DATA like this: my $fh = *DATA; print while (<fh>); Pass $fh to the sub.

In almost all of the situations I deal with, throwing an error for a malformed file input is the correct behaviour. This is a usually good thing and the input file needs to be fixed. It is rare for me to throw away or silently ignore a malformed record. Of course "seldom" does not mean "never". It could certainly be argued that the program that doesn't throw an undef warning is in error! Of course the programs I demoed can be modified to have either behaviour.

I think a state machine type approach would be better, because it is more flexible and can handle the above cases specially, if needed.

I guess we disagree. I don't see any case for "more flexible". However, having said that, there is no real quibble on my part with having a state variable approach. Using a sub() to keep track of the "inside record" state is very clean. I actually think the Perl flip-flop operator is very cool. No problem with that either! When I use it, I have to go to Grandfather's classic post and look at the various start/end regex situations.

I often have to write "one-off" programs to convert wierd file formats. I will attach such a program that I wrote a few days ago. For such a thing, efficiency doesn't matter, "general purpose" doesn't matter - I will never see a file like this again. My job was to convert this file as part of a larger project. This is not "perfect" but it did its job.

#!/usr/bin/perl use strict; use warnings; use Data::Dump qw(pp); use Data::Dumper; $|=1; while (my $line = <DATA>) { process_record ($line) if $line =~ /^<CALL/; } sub process_record { my $line = shift; chomp $line; my $data = $line; while ( $line = <DATA>) { last if $line =~ /^<EOR/; chomp $line; $data .= $line; } my %hash = $data =~ /<(\w+):\d+>([\w. ]+)/g; print_Cabrillo_QSO (\%hash); } sub print_Cabrillo_QSO { my $Qref = shift; print "QSO: "; my $freq = $Qref->{FREQ}*100; $freq = int $freq; printf "%i6 ",$freq; print "PH "; my $date = $Qref->{QSO_DATE}; # 29180504 => 2019-05-04 $date =~ s/(\d\d\d\d)(\d\d)(\d\d)/$1-$2-$3/; print "$date "; my $time = $Qref->{TIME_ON}; $time =~ s/^(\d\d\d\d).*/$1/; print "$time "; print "W7RN 59 NVSTO "; printf "%15s ",$Qref->{CALL}; print "59 "; $Qref->{COMMENT}=~ s/ +//g; #assume next field is < print $Qref->{COMMENT}; # my $qth = $Qref->{QTH}; #$qth //= ''; #print $qth; print "\n"; } =Prints QSO: 3816 PH 2019-05-05 0659 W7RN 59 NVSTO W6LVW 5 +9 CO QSO: 3816 PH 2019-05-05 0657 W7RN 59 NVSTO K7CAR 5 +9 UTWSH =cut __DATA__ This ADIF file was created by MacLoggerDX <PROGRAMID:11>MacLoggerDX<PROGRAMVERSION:4>6.22<ADIF_VER:5>3.0.7 <EOH> <CALL:5>W6LVW<NAME:18>Michael J Sparling<QTH:8>MONUMENT<STATE:2>CO<CNT +Y:7>El Paso<QSO_DATE:8>20190505<TIME_ON:6>065952<QSO_DATE_OFF:8>20190 +505<TIME_OFF:6>070013 <FREQ_RX:5>3.816<FREQ:5>3.816<BAND:3>80M<BAND_RX:3>80M<MODE:3>SSB<SUBM +ODE:3>LSB <TX_PWR:3>100<ANT_AZ:4>86.8<RST_SENT:2>59<RST_RCVD:2>59 <DXCC:3>291<COUNTRY:13>United States<GRIDSQUARE:6>DM79nb<LAT:11>N039 0 +4.562<LON:11>W104 53.096 <MY_GRIDSQUARE:6>DM09ei<OPERATOR:4>K5XI<MY_RIG:11>Elecraft K3<COMMENT: +2>CO<EMAIL:19>mickspa@comcast.net <EOR> <CALL:5>K7CAR<NAME:13>Kent B O Sell<QTH:9>Hillsboro<STATE:2>OR<CNTY:10 +>Washington<QSO_DATE:8>20190505<TIME_ON:6>065758<QSO_DATE_OFF:8>20190 +505<TIME_OFF:6>065814 <FREQ_RX:5>3.816<FREQ:5>3.816<BAND:3>80M<BAND_RX:3>80M<MODE:3>SSB<SUBM +ODE:3>LSB <TX_PWR:3>100<ANT_AZ:3>124<RST_SENT:2>59<RST_RCVD:2>59<QSL_VIA:10>eQSL +, LoTW <DXCC:3>291<COUNTRY:13>United States<GRIDSQUARE:6>DM44ik<LAT:11>N034 2 +5.359<LON:11>W111 19.869 <MY_GRIDSQUARE:6>DM09ei<OPERATOR:4>K5XI<MY_RIG:11>Elecraft K3<COMMENT: +6>UT WSH<EMAIL:17>kent@premier1.net <EOR>

In reply to Re^3: processing file content as string vs array by Marshall
in thread processing file content as string vs array by vinoth.ree

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-20 03:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found