Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Shorten the headers of a file and remove empty lines using perl

by Preceptor (Chaplain)
on Jun 13, 2013 at 21:34 UTC ( #1038844=note: print w/ replies, xml ) Need Help??


in reply to Shorten the headers of a file and remove empty lines using perl

What you need to accomplish this is regular expressions. Learning regular expressions can be a bit painful, but they're really incredibly powerful. As a sample, the code below might give a start point.

The relevant documentation is perlre

open ( my $input_fh, "<", $input_file ); open ( my $output_fh, ">", $output_file ); foreach my $line ( <$input_fh> ) { unless ( $line =~ m/\A\s*\Z/ ) { $line =~ s/(GL\d{6}))\d+/$1/; print $output_fh $line; } } close ( $input_fh ); close ( $output_fh );

The essence is - first you test if a line is blank. Then you use a 'search and replace pattern' to trim any pattern starting GL, followed by 6 digits, to 6 digits.


Comment on Re: Shorten the headers of a file and remove empty lines using perl
Download Code
Re^2: Shorten the headers of a file and remove empty lines using perl
by Anonymous Monk on Jun 13, 2013 at 22:51 UTC
    How can I input the "input_file"? "$input_file" is a scalar but not a file. I run the following script:

    #!/usr/local/bin/perl use warnings; open ( my $input_fh, "<", $genome ); open ( my $output_fh, ">", $output_file ); foreach my $line ( <$input_fh> ) { unless ( $line =~ m/\A\s*\Z/ ) { $line =~ s/(GL\d{6})\d+/$1/; print $output_fh $line; } } close ( $input_fh ); close ( $output_fh );
    and got the following messages:

    Name "main::genome" used only once: possible typo at header.pl line 3. Name "main::output_file" used only once: possible typo at header.pl line 4. Use of uninitialized value $genome in open at header.pl line 3. Use of uninitialized value $output_file in open at header.pl line 4. readline() on closed filehandle $input_fh at header.pl line 5.

    I think my way to open the file is wrong. Can you give me some hints? Thanks XF
      foreach my $line ( <$input_fh> )

      should be

      while (my $line = <$input_fh> )

      The first form is a glob, (but I don't know well enough to explain it to you).

      The second line should work properly for reading your file.

      Update: Yes Choroba is correct, it is not a glob - my mistake,

        It is not a glob. It is a readline in list context. If the file is large, it can eat lots of memory, and should therefor be avoided.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Indeed - the bit I posted initially will work, but it'll trigger reading the whole file into memory. Probably not a good idea with a 500MB file.

        Perl contexts are incredibly clever, but do lead to some interesting gotchas - use of a filehandle in a scalar or array context is one of them.

      Ah, I didn't see your problem clearly. First, you should use strict; as well as use warnings;, which you did, in the header of your program. Then, you have to assign the name of your file to $genome.

      my $genome = 'whateverthename';

      (You must assign a name to your output file also)

        It works very well. Although I do not quite understand all the script yet, I will work on that. Thank all people here for your help!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1038844]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2015-07-06 03:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (69 votes), past polls