modifying a file with regex!

rna_follower has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to modify my file by using Regex to replcace/substitute strings/numbers:

Example fie:

>Sample_1_x80

AGGGGGGGGGTTCCC

>Sample_2_x85

TTTCCCGGGAAAA

>sample_3_x112

GGCCCCTTTGAGG

And I want to modify it to print like so(ID line should be tab-delimited):

>ID1 80

AGGGGGGGGGTTCCC

>ID2 85

TTTCCCGGGAAAA

and so on ....

My best effort:

#!usr/bin/perl
$file;
@files;
$filename;

$filename = <STDIN>;
open(FILENAME, "<$filename") or die "can't open file";

while($file = <FILENAME>){
    
    chomp $file;
    $file =~ s/sample\_\d\_x?/ID\t/;
    
    print $file, "\n";
    
}
[download]

Comment on modifying a file with regex! Download Code

Replies are listed 'Best First'.
Re: modifying a file with regex! by tobyink (Canon) on Mar 16, 2012 at 21:51 UTC
This is how I'd do it... #!/usr/bin/perl use autodie; # Automatic errors on file problems. use strict; # This is the name of the file we want to modify. my $filename = 'modify-file.txt'; # We're going to create a temporary file. This avoids us having # to build up a potentially large string in memory. my $tempname = $filename . '.tmp'; do { # Open both files. Doing this using lexical file handles # within a "do" block means that when the end of the block # is reached, the files will be closed. open my $input_h, '<', $filename; # input handle open my $output_h, '>', $tempname; # output handle # Loop through each line of input. while (<$input_h>) { # Modify the line s/^>Sample_(\d+)_x(\d+)/>ID$1 $2/i; # Write it out. print $output_h $_; } }; # Delete the original file. unlink $filename while -f $filename; # Rename the temporary file to the original filename. rename $tempname => $filename; [download] `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l]
Re^2: modifying a file with regex! by Marshall (Canon) on Mar 16, 2012 at 22:56 UTC
This extraneous "do" is completely unnecessary. It actually "harms" by introducing an unnecessary level of indentation - which is a hindrance to readability.	[reply]
Re^3: modifying a file with regex! by tobyink (Canon) on Mar 16, 2012 at 23:44 UTC
It eliminates two calls to `close` and allows some lexical variables (`$input_h` and `$output_h`) to live in a smaller scope. Indent it however you like; this ain't Python. `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l] [select]
Re^4: modifying a file with regex! by Marshall (Canon) on Mar 17, 2012 at 00:31 UTC
Re^5: modifying a file with regex! by Anonymous Monk on Mar 17, 2012 at 00:42 UTC
Some notes below your chosen depth have not been shown here
Re: modifying a file with regex! by JavaFan (Canon) on Mar 16, 2012 at 21:50 UTC
Untested: `perl -i.bak -pe 's/^>Sample_([0-9]+)_x([0-9]+)$/ID$1 $2/' filename` [download]	[reply] [d/l]
Re: modifying a file with regex! by Anonymous Monk on Mar 16, 2012 at 22:11 UTC
Some issues with your effort the shebang is not an absolute path not using strict/warnings, Read this if you want to cut your development time in half! you're using `<STDIN>` instead of @ARGV (as in `perl myprogram.pl`) you're reading from FILEHANDLE but your printing to STDOUT you're using FILEHANDLE instead of $filehandle your regular expression is case sensitive and it doesn't match your sample data The general steps for editing a are read from original-file modify data write to new-file rename new-file to original-file So you might write that as #!/usr/bin/perl -- use strict; use warnings; use autodie 2.1001; use File::Temp qw/ tempfile /; use File::Copy qw/ move /; use autodie qw/ move /; Main( @ARGV ); exit( 0 ); sub Main { return Usage() unless @_ ; for my $file ( @_) { print "Converting $file \n"; ConvertFile( $file ); } } sub ConvertFile { my $infilename = shift; my ($outfh, $outfilename) = tempfile(); open my($infh), '<', $infilename; # autodie dies on error while( my $line = <$infh> ){ chomp $line; $line =~ s/sample\_\d\_x?/ID\t/i; print $outfh $line, "\n"; } close $infh; close $outfh; move( $outfilename, $infilename ); # autodie dies on error } sub Usage { print <<"__USAGE__"; $0 $0 modify/this/file perl ${\__FILE__} perl ${\__FILE__} modify/this/file __USAGE__ } ## end sub Usage __END__ [download] See use, autodie, open, File::Copy, File::Temp, strict, warnings, perlintro, perlretut, perlrequick, YAPE::Regex::Explain, Beginning Perl (free) Chapter 6: Files and Data, Modern Perl: Chapter 9: Managing Real Programs > Files	[reply] [d/l] [select]
Re: modifying a file with regex! by Marshall (Canon) on Mar 16, 2012 at 22:19 UTC
There is no need to substitute anything. Capture what is necessary and re-format the ">" line. No need to be overly tricky when a couple of straight-forward lines of code will do. #!/usr/bin/perl -w use strict; my $ID = 1; while (<DATA>) { # this regex captures the trailing number if # the line starts with a ">" # the .? means a "minimal match" of anything while # allowing the rest of the regex to succeed. # the \n is counted as white space, a \s character # if (my ($number) = $_ =~ /^>.?(\d+)\s$/) { print '>ID'.$ID++," $number\n"; } else { print; } } =prints >ID1 80 AGGGGGGGGGTTCCC >ID2 85 TTTCCCGGGAAAA >ID3 112 GGCCCCTTTGAGG =cut __DATA__ >Sample_1_x80 AGGGGGGGGGTTCCC >Sample_2_x85 TTTCCCGGGAAAA >sample_3_x112 GGCCCCTTTGAGG [download] Well, if you want to get the sample number from the ">" line then: `while (<DATA>) { if (my ($sample, $number) = $_ =~ /^>.?(\d+).?(\d+)\s*$/) { print '>ID'.$sample," $number\n"; } else { print; } }` [download] which will print the same thing	[reply] [d/l] [select]
Re^2: modifying a file with regex! by rna_follower (Initiate) on Mar 17, 2012 at 00:15 UTC
Thanks everyone for your useful comments/codes!	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.


There's more than one way to do things
	PerlMonks