Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Cleaner way of looping through a file and stripping only certain lines?

by texasperl (Sexton)
on Dec 08, 2006 at 16:51 UTC ( #588638=perlquestion: print w/ replies, xml ) Need Help??
texasperl has asked for the wisdom of the Perl Monks concerning the following question:

Hello my esteemed Monks, I have once again come to seek your wisdom in making my Perl more... Perl-ish. I have the following working code that will strip any MX records greater than 8 from a file (tinydns format). However, looking at the code, I feel that there is likely a shorter and more concise way to write it. I suspect some use of map or grep would be more efficient, however, I am not entirely familiar and/or comfortable with those functions. Hopefully through a Monk or two sharing his/her wisdom, I'll be able to write faster, better code (and understand it, too.) Also, I'm curious as to how one would go about editing the file in place, i.e., not having to create a new outfile.
Without further ado, here is the code.
#!/usr/bin/perl -w use strict; my $ifile = shift; open IFILE, '<', "$ifile" || die "Couldn't open file: $!"; open OFILE, '+>', 'new-data' || die "Couldn't open outfile: $!"; while (my $line = <IFILE>) { # is the line an MX record? if ($line =~ /^@.*mail(\d+).*\z/xms) { # is it less than or equal to 8? if ($1 <= 8) { print OFILE $line; } } # print everything else to the new file else { print OFILE $line; } } close IFILE; close OFILE;
The data is in this format:
@*.somedomain.net::mail7.somedomain.net:10:21600
@somedomain.net::mail7.somedomain.net:10:21600
Thanks again, Monks!

Comment on Cleaner way of looping through a file and stripping only certain lines?
Download Code
Re: Cleaner way of looping through a file and stripping only certain lines?
by liverpole (Monsignor) on Dec 08, 2006 at 16:59 UTC
    Hi texasperl,

    If it's not a huge file, you could read it into memory and use map to iterate over all the lines:

    #!/usr/bin/perl -w use strict; my $ifile = shift || die "No filename provided\n"; open IFILE, '<', "$ifile" || die "Couldn't open file: $!"; open OFILE, '+>', 'new-data' || die "Couldn't open outfile: $!"; chomp(my @lines = <IFILE>); close IFILE; my @matches = map { /^@.*mail(\d+).*\z/? $1 <= 8? $1: ( ): $_ } @lines +; map { print OFILE "$_\n" } @matches; close OFILE;

    But I don't see anything wrong with the way you did it, except that I would recommend error-checking for the case where a filename isn't passed to the program.

    Update:  Fixed to take skip saving anything in the case where the captured pattern isn't <= 8.

    Update 2:  Cleaned up syntax further.


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      This is exactly the kind of thing I was talking about. Perfect!++ Thanks!
Re: Cleaner way of looping through a file and stripping only certain lines?
by grep (Monsignor) on Dec 08, 2006 at 17:03 UTC
    I'm curious as to how one would go about editing the file in place, i.e., not having to create a new outfile.

    Open the file. Read the file into an array. Close the file. Open the same file for overwrite and write what you want back.

    open(IN, '<', $ifile) or die "Couldn't open file: $!\n"; my @data = <IN>; close IN; open(OUT, '>', $ifile) or die "Couldn't open outfile: $!\n"; foreach my $line (@data) { ### Do what you want print OUT "$line"; }
    UPDATE: Fixed some C&P errs

    grep
    XP matters not. Look at me. Judge me by my XP, do you?

      >>I'm curious as to how one would go about editing the file in place, i.e., not having to create a new outfile.
      >Open the file. Read the file into an array. Close the file. Open the same file for overwrite and write what you want back.

      This works, but isn't it inherently risky for cases where there's a power failure or other hiccup during processing — you corrupt your original?

      throop

        Yes it is. But so is any in-place edit.

        A safer answer would be to use File::Temp. You still have a problem with power failure but your window of problems is smaller.

        ## UNTESTED use File::Temp 'tempfile'; use File::Copy; my ($tmp_FH,$tmp_fn) = tempfile(); open(IN, '<', $ifile) or die "Couldn't open file: $!\n"; my @data = <IN>; close IN; foreach my $line (@data) { ### Do what you want print $tmp_FH "$line"; } copy($tmp_fn,$ifile);

        grep
        XP matters not. Look at me. Judge me by my XP, do you?

        The safer way to do this sort of thing is to write to a new file and then rename over the file you want to replace. This is what you do when handling mbox files for instance.

        The rename is an atomic operation. It's guaranteed to succeed completely or not at all (on Unix-like boxes), so you can never lose data as a result of a power failure.

        This is a similar trick to perl -i, but I think that does a rename of the original file and then writes back into the original file, which still leaves open the case that you could have bad (partially written) data in the original file on power failure.

Re: Cleaner way of looping through a file and stripping only certain lines?
by ikegami (Pope) on Dec 08, 2006 at 17:18 UTC
    • open IFILE, '<', "$ifile" || die "Couldn't open file: $!";
      is buggy. Due to the operator order or precedence, it's equivalent to
      open IFILE, '<', ("$ifile" || die "Couldn't open file: $!");
      You want one of the following instead
      open IFILE, '<', "$ifile" or die "Couldn't open file: $!";
      open(IFILE, '<', "$ifile") || die "Couldn't open file: $!";
      open(IFILE, '<', "$ifile") or die "Couldn't open file: $!";
      (open IFILE, '<', "$ifile") || die "Couldn't open file: $!";
      (open IFILE, '<', "$ifile") or die "Couldn't open file: $!";

    • Same goes for the second open.

    • Why "$ifile" instead of just $ifile?

    • Why '+>' instead of just '>'?

    • /^@.*mail(\d+).*\z/xms
      can be simplified to
      /^@.*mail(\d+)/xms
      All three modifiers (xms) could be removed from the match operator, but they cause no harm here.

    • The user doesn't need to see the program line number when he specifies a bad file name. If the error message is not good enough to identify a user error without resorting to a line number, it needs to be improved.

    With changes applied:

    #!/usr/bin/perl -w use strict; my $ifile = shift; my $ofile = 'new_data'; open my $ifh, '<', $ifile or die "Couldn't open DNS file \"$ifile\": $!\n"; open my $ofh, '>', $ofile or die "Couldn't create output file \"$ofile\": $!\n"; while (my $line = <$ifh>) { # is the line an MX record? if ($line =~ /^@.*mail(\d+)/xms) { # is it less than or equal to 8? if ($1 <= 8) { print $ofh $line; } } # print everything else to the new file else { print $ofh $line; } }

    You could make your program sipler and more flexible by using STDIN and STDOUT.

    #!/usr/bin/perl -w use strict; # # Usage: # fixdns infile > outfile # # Usage for in place editing: # perl -i fixdns dnsfile # while (<>) { # is the line an MX record? if (/^@.*mail(\d+)/xms) { # is it less than or equal to 8? if ($1 <= 8) { print; } } # print everything else to the new file else { print; } }
      This is some great insight. I especially gleaned a lot of wisdom from these bits:
      • not using "" around the variable in open
      • operator precedence in the open statement
      • using STDIN and STDOUT instead of named filehandles
      Thank you very much for your excellent insight. Hopefully someone else can also learn from this post (that's why I post things like this.)
        operator precedence in the open statement

        Not trying to kick you while you're down (honestly! =]), and it could just be a case of me misreading your sentence, but the operator precedence ikegami explained is not limited to the open statement -- it's part of the perl parser in general.

        Update: Ah, misread the original sentence indeed. Sorry 'bout that. No harm intended.

      I would add that you can avoid the use of a bareword filehandle by calling open with an undef scalar instead:
      open my $input_fh, '<', "$ifile" or die "Couldn't open file: $!";
      now use $input_fh just like you would have used IFILE
Re: Cleaner way of looping through a file and stripping only certain lines?
by hiseldl (Priest) on Dec 08, 2006 at 21:31 UTC

    I did not see the requirement that you wanted a script, so, here's how I would do it...

    You can use the '-i' switch to edit the file in-place. See perlrun for reference.

    If you do not want to create a backup file...

    $ perl -ni -e '$match=/::mail(\d+)/;print if!$match||$1>7' data

    If you do want to create a backup file with extension '.bak'...

    perl -ni.bak -e '$match=/::mail(\d+)/;print if!$match||$1>7' data

    This assumes that the file containing the data is named 'data'.

    Here's the nuts-and-bolts explanation, skip it if you already know how it works.

    Basically, this will check every line to see if it matches the regex, in your case you wanted to match something like /::mail(\d+)/ to capture the digits (please put whatever regexp you need in there, this is untested and for example purposes only). The match operator is used for two purposes, (1) letting us know if it matched, and (2) capturing the digits of interest if it does match. Here, I store the boolean value in $match for later usage. The second statement is the conditional print, which will print the current line if it did not match or if the value of the captured digits is greater than 7.

    Please change it according to your needs.

    HTH.

    --
    hiseldl
    What time is it? It's Camel Time!

Re: Cleaner way of looping through a file and stripping only certain lines?
by mreece (Friar) on Dec 09, 2006 at 17:06 UTC
    here's a unixy non-perl solution:

    egrep -v '@.*mail(9|[1-9][0-9])' < in-file > out-file

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://588638]
Approved by grep
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2014-07-28 21:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (210 votes), past polls