Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Shorten the headers of a file and remove empty lines using perl

by Preceptor (Chaplain)
on Jun 13, 2013 at 21:34 UTC ( #1038844=note: print w/ replies, xml ) Need Help??


in reply to Shorten the headers of a file and remove empty lines using perl

What you need to accomplish this is regular expressions. Learning regular expressions can be a bit painful, but they're really incredibly powerful. As a sample, the code below might give a start point.

The relevant documentation is perlre

open ( my $input_fh, "<", $input_file ); open ( my $output_fh, ">", $output_file ); foreach my $line ( <$input_fh> ) { unless ( $line =~ m/\A\s*\Z/ ) { $line =~ s/(GL\d{6}))\d+/$1/; print $output_fh $line; } } close ( $input_fh ); close ( $output_fh );

The essence is - first you test if a line is blank. Then you use a 'search and replace pattern' to trim any pattern starting GL, followed by 6 digits, to 6 digits.


Comment on Re: Shorten the headers of a file and remove empty lines using perl
Download Code
Re^2: Shorten the headers of a file and remove empty lines using perl
by Anonymous Monk on Jun 13, 2013 at 22:51 UTC
    How can I input the "input_file"? "$input_file" is a scalar but not a file. I run the following script:

    #!/usr/local/bin/perl use warnings; open ( my $input_fh, "<", $genome ); open ( my $output_fh, ">", $output_file ); foreach my $line ( <$input_fh> ) { unless ( $line =~ m/\A\s*\Z/ ) { $line =~ s/(GL\d{6})\d+/$1/; print $output_fh $line; } } close ( $input_fh ); close ( $output_fh );
    and got the following messages:

    Name "main::genome" used only once: possible typo at header.pl line 3. Name "main::output_file" used only once: possible typo at header.pl line 4. Use of uninitialized value $genome in open at header.pl line 3. Use of uninitialized value $output_file in open at header.pl line 4. readline() on closed filehandle $input_fh at header.pl line 5.

    I think my way to open the file is wrong. Can you give me some hints? Thanks XF
      foreach my $line ( <$input_fh> )

      should be

      while (my $line = <$input_fh> )

      The first form is a glob, (but I don't know well enough to explain it to you).

      The second line should work properly for reading your file.

      Update: Yes Choroba is correct, it is not a glob - my mistake,

        It is not a glob. It is a readline in list context. If the file is large, it can eat lots of memory, and should therefor be avoided.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Indeed - the bit I posted initially will work, but it'll trigger reading the whole file into memory. Probably not a good idea with a 500MB file.

        Perl contexts are incredibly clever, but do lead to some interesting gotchas - use of a filehandle in a scalar or array context is one of them.

      Ah, I didn't see your problem clearly. First, you should use strict; as well as use warnings;, which you did, in the header of your program. Then, you have to assign the name of your file to $genome.

      my $genome = 'whateverthename';

      (You must assign a name to your output file also)

        It works very well. Although I do not quite understand all the script yet, I will work on that. Thank all people here for your help!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1038844]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2014-08-02 04:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (54 votes), past polls