Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??


you have been given some good solutions, I won't give you any other. I would just like to make some comments on your code.

my ($TEXTIN,$HTMLOUT); my $input2 = $outputfile; my $output2 = "FTFIMS.html"; my @records=(); my $inrecord=0; my $rxRecStart = qr{STORE\d+}; my $rxRecStop = qr{\n\s}; my $recordStr = q{};

It is not a very good idea to declare all your lexical variables at the top of your program, because you are essentially making them global to the whole file and this negates a large part of the advantages of lexical variables. Try to limit scope of variables to the enclosing block where they belong.

open $TEXTIN,"<",$input2 || die "Can not open $input2: $!\n";

This will not die if the program fails to open the file (say, if the file does not exists), because of precedence problems.

You should either have parens:

open ($TEXTIN, "<", $input2) || die "Can not open $input2: $!\n";

or use the lower precedence operator or:

open $TEXTIN, "<", $input2 or die "Can not open $input2: $!\n";

But it would even be better to declare your filehandler within that statement:

open my $TEXTIN, "<", $input2 or die "Can not open $input2: $!\n";

Same thing of course for the other file opening statement.

Otherwise, I think that the algorithm within your while loop is too complicated, error prone and not very robust. Especially, the $rxRecStop regexp is very weak and might match where you don't expect. Also, it will probably not match anything at the end of the file, so that the last section will not be recorded.

Rather than having a beginning and end regexp, I think it would probably be better to have only one break regexp (the /STORE\d{3}/ is a good candidate). When you meet it, you do the house cleaning of the previous section (storing data) and the preparation of the next section (reinitializing the variables). Something like this:

my $header = ""; $header .= <$TEXTIN> for 0..1; # record the first two lines for later +use while(<$TEXTIN>){ next if m/^\s*$/; # get rid of empty lines if (m{$rxRecStart}){ # do what you need to finish off the previous section # (saving the data) and start the new one } else { $recordStr .= $_; } } # add code here for storing the last section

I just wanted to propose some improvements on the basis of your code, to help you think about it. I think that the solution with the modification of the input record separator proposed by Rolf and others is probably better.

Edit: changed the way $header is assigned to prevent an "initialized value" warning.

In reply to Re: Splitting a file into records by Laurent_R
in thread Splitting a file into records by TStanley

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others having an uproarious good time at the Monastery: (2)
    As of 2018-07-23 04:28 GMT
    Find Nodes?
      Voting Booth?
      It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

      Results (459 votes). Check out past polls.