Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: Shorten the headers of a file and remove empty lines using perl

by Anonymous Monk
on Jun 13, 2013 at 22:51 UTC ( #1038853=note: print w/ replies, xml ) Need Help??


in reply to Re: Shorten the headers of a file and remove empty lines using perl
in thread Shorten the headers of a file and remove empty lines using perl

How can I input the "input_file"? "$input_file" is a scalar but not a file. I run the following script:

#!/usr/local/bin/perl use warnings; open ( my $input_fh, "<", $genome ); open ( my $output_fh, ">", $output_file ); foreach my $line ( <$input_fh> ) { unless ( $line =~ m/\A\s*\Z/ ) { $line =~ s/(GL\d{6})\d+/$1/; print $output_fh $line; } } close ( $input_fh ); close ( $output_fh );
and got the following messages:

Name "main::genome" used only once: possible typo at header.pl line 3. Name "main::output_file" used only once: possible typo at header.pl line 4. Use of uninitialized value $genome in open at header.pl line 3. Use of uninitialized value $output_file in open at header.pl line 4. readline() on closed filehandle $input_fh at header.pl line 5.

I think my way to open the file is wrong. Can you give me some hints? Thanks XF


Comment on Re^2: Shorten the headers of a file and remove empty lines using perl
Download Code
Re^3: Shorten the headers of a file and remove empty lines using perl
by Cristoforo (Deacon) on Jun 13, 2013 at 23:27 UTC
    foreach my $line ( <$input_fh> )

    should be

    while (my $line = <$input_fh> )

    The first form is a glob, (but I don't know well enough to explain it to you).

    The second line should work properly for reading your file.

    Update: Yes Choroba is correct, it is not a glob - my mistake,

      It is not a glob. It is a readline in list context. If the file is large, it can eat lots of memory, and should therefor be avoided.
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Indeed - the bit I posted initially will work, but it'll trigger reading the whole file into memory. Probably not a good idea with a 500MB file.

      Perl contexts are incredibly clever, but do lead to some interesting gotchas - use of a filehandle in a scalar or array context is one of them.

Re^3: Shorten the headers of a file and remove empty lines using perl
by Cristoforo (Deacon) on Jun 13, 2013 at 23:43 UTC
    Ah, I didn't see your problem clearly. First, you should use strict; as well as use warnings;, which you did, in the header of your program. Then, you have to assign the name of your file to $genome.

    my $genome = 'whateverthename';

    (You must assign a name to your output file also)

      It works very well. Although I do not quite understand all the script yet, I will work on that. Thank all people here for your help!

        I skipped the bit where you set up the names of input file and output file. But generally speaking, you should always always 'use strict;' and 'use warnings;'. They really are the very best ways to stop a program doing anything weird.

        And as mentioned above- a while look is better than a foreach if you're processing a large file. (Makes little odds for a small file, but it's good form).

        Perl is very clever - it understand context. <$input_fh> says 'read from $input_fh' but if you do:

        my $line = <$input_fh>;

        it simply reads the next line. Where if you do

        my @whole_file = <$input_fh>;

        It will read the whole file into that array - which is in effect what my first snippet does. It doesn't make much difference if you're working with a small file, but the difference will become very important with a 500MB file.

        I'd strongly suggest taking time to understand what each line is doing - code that someone on the internet gave you is never trustworthy. (Although on Perlmonks, usually true evil will get stomped upon)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1038853]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2014-07-28 22:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (210 votes), past polls