Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Hi,

you have been given some good solutions, I won't give you any other. I would just like to make some comments on your code.

my ($TEXTIN,$HTMLOUT); my $input2 = $outputfile; my $output2 = "FTFIMS.html"; my @records=(); my $inrecord=0; my $rxRecStart = qr{STORE\d+}; my $rxRecStop = qr{\n\s}; my $recordStr = q{};

It is not a very good idea to declare all your lexical variables at the top of your program, because you are essentially making them global to the whole file and this negates a large part of the advantages of lexical variables. Try to limit scope of variables to the enclosing block where they belong.

open $TEXTIN,"<",$input2 || die "Can not open $input2: $!\n";

This will not die if the program fails to open the file (say, if the file does not exists), because of precedence problems.

You should either have parens:

open ($TEXTIN, "<", $input2) || die "Can not open $input2: $!\n";

or use the lower precedence operator or:

open $TEXTIN, "<", $input2 or die "Can not open $input2: $!\n";

But it would even be better to declare your filehandler within that statement:

open my $TEXTIN, "<", $input2 or die "Can not open $input2: $!\n";

Same thing of course for the other file opening statement.

Otherwise, I think that the algorithm within your while loop is too complicated, error prone and not very robust. Especially, the $rxRecStop regexp is very weak and might match where you don't expect. Also, it will probably not match anything at the end of the file, so that the last section will not be recorded.

Rather than having a beginning and end regexp, I think it would probably be better to have only one break regexp (the /STORE\d{3}/ is a good candidate). When you meet it, you do the house cleaning of the previous section (storing data) and the preparation of the next section (reinitializing the variables). Something like this:

my $header = ""; $header .= <$TEXTIN> for 0..1; # record the first two lines for later +use while(<$TEXTIN>){ next if m/^\s*$/; # get rid of empty lines if (m{$rxRecStart}){ # do what you need to finish off the previous section # (saving the data) and start the new one } else { $recordStr .= $_; } } # add code here for storing the last section

I just wanted to propose some improvements on the basis of your code, to help you think about it. I think that the solution with the modification of the input record separator proposed by Rolf and others is probably better.

Edit: changed the way $header is assigned to prevent an "initialized value" warning.

In reply to Re: Splitting a file into records by Laurent_R
in thread Splitting a file into records by TStanley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-19 10:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found