Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

modifying a file with regex!

by rna_follower (Initiate)
on Mar 16, 2012 at 21:19 UTC ( [id://960069]=perlquestion: print w/replies, xml ) Need Help??

rna_follower has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to modify my file by using Regex to replcace/substitute strings/numbers:

Example fie:

>Sample_1_x80

AGGGGGGGGGTTCCC

>Sample_2_x85

TTTCCCGGGAAAA

>sample_3_x112

GGCCCCTTTGAGG

And I want to modify it to print like so(ID line should be tab-delimited):

>ID1 80

AGGGGGGGGGTTCCC

>ID2 85

TTTCCCGGGAAAA

and so on ....

My best effort:

#!usr/bin/perl $file; @files; $filename; $filename = <STDIN>; open(FILENAME, "<$filename") or die "can't open file"; while($file = <FILENAME>){ chomp $file; $file =~ s/sample\_\d\_x?/ID\t/; print $file, "\n"; }

Replies are listed 'Best First'.
Re: modifying a file with regex!
by tobyink (Canon) on Mar 16, 2012 at 21:51 UTC

    This is how I'd do it...

    #!/usr/bin/perl use autodie; # Automatic errors on file problems. use strict; # This is the name of the file we want to modify. my $filename = 'modify-file.txt'; # We're going to create a temporary file. This avoids us having # to build up a potentially large string in memory. my $tempname = $filename . '.tmp'; do { # Open both files. Doing this using lexical file handles # within a "do" block means that when the end of the block # is reached, the files will be closed. open my $input_h, '<', $filename; # input handle open my $output_h, '>', $tempname; # output handle # Loop through each line of input. while (<$input_h>) { # Modify the line s/^>Sample_(\d+)_x(\d+)/>ID$1 $2/i; # Write it out. print $output_h $_; } }; # Delete the original file. unlink $filename while -f $filename; # Rename the temporary file to the original filename. rename $tempname => $filename;
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      This extraneous "do" is completely unnecessary. It actually "harms" by introducing an unnecessary level of indentation - which is a hindrance to readability.

        It eliminates two calls to close and allows some lexical variables ($input_h and $output_h) to live in a smaller scope.

        Indent it however you like; this ain't Python.

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: modifying a file with regex!
by JavaFan (Canon) on Mar 16, 2012 at 21:50 UTC
    Untested:
    perl -i.bak -pe 's/^>Sample_([0-9]+)_x([0-9]+)$/ID$1 $2/' filename
Re: modifying a file with regex!
by Anonymous Monk on Mar 16, 2012 at 22:11 UTC

    Some issues with your effort

    • the shebang is not an absolute path
    • not using strict/warnings, Read this if you want to cut your development time in half!
    • you're using <STDIN> instead of @ARGV (as in  perl myprogram.pl)
    • you're reading from FILEHANDLE but your printing to STDOUT
    • you're using FILEHANDLE instead of $filehandle
    • your regular expression is case sensitive and it doesn't match your sample data

    The general steps for editing a are

    • read from original-file
    • modify data
    • write to new-file
    • rename new-file to original-file

    So you might write that as

    #!/usr/bin/perl -- use strict; use warnings; use autodie 2.1001; use File::Temp qw/ tempfile /; use File::Copy qw/ move /; use autodie qw/ move /; Main( @ARGV ); exit( 0 ); sub Main { return Usage() unless @_ ; for my $file ( @_) { print "Converting $file \n"; ConvertFile( $file ); } } sub ConvertFile { my $infilename = shift; my ($outfh, $outfilename) = tempfile(); open my($infh), '<', $infilename; # autodie dies on error while( my $line = <$infh> ){ chomp $line; $line =~ s/sample\_\d\_x?/ID\t/i; print $outfh $line, "\n"; } close $infh; close $outfh; move( $outfilename, $infilename ); # autodie dies on error } sub Usage { print <<"__USAGE__"; $0 $0 modify/this/file perl ${\__FILE__} perl ${\__FILE__} modify/this/file __USAGE__ } ## end sub Usage __END__

    See use, autodie, open, File::Copy, File::Temp, strict, warnings, perlintro, perlretut, perlrequick, YAPE::Regex::Explain, Beginning Perl (free) Chapter 6: Files and Data, Modern Perl: Chapter 9: Managing Real Programs > Files

Re: modifying a file with regex!
by Marshall (Canon) on Mar 16, 2012 at 22:19 UTC
    There is no need to substitute anything. Capture what is necessary and re-format the ">" line.
    No need to be overly tricky when a couple of straight-forward lines of code will do.
    #!/usr/bin/perl -w use strict; my $ID = 1; while (<DATA>) { # this regex captures the trailing number if # the line starts with a ">" # the .*? means a "minimal match" of anything while # allowing the rest of the regex to succeed. # the \n is counted as white space, a \s* character # if (my ($number) = $_ =~ /^>.*?(\d+)\s*$/) { print '>ID'.$ID++," $number\n"; } else { print; } } =prints >ID1 80 AGGGGGGGGGTTCCC >ID2 85 TTTCCCGGGAAAA >ID3 112 GGCCCCTTTGAGG =cut __DATA__ >Sample_1_x80 AGGGGGGGGGTTCCC >Sample_2_x85 TTTCCCGGGAAAA >sample_3_x112 GGCCCCTTTGAGG
    Well, if you want to get the sample number from the ">" line then:
    while (<DATA>) { if (my ($sample, $number) = $_ =~ /^>.*?(\d+).*?(\d+)\s*$/) { print '>ID'.$sample," $number\n"; } else { print; } }
    which will print the same thing
      Thanks everyone for your useful comments/codes!
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://960069]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-25 14:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found