Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
No such thing as a small change
 
PerlMonks  

How to parse this file

by hervebags (Initiate)
on Oct 19, 2011 at 13:50 UTC ( #932408=perlquestion: print w/ replies, xml ) Need Help??
hervebags has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have only been learning Perl for less than 2 weeks. I am a C++ programmer.

I have attached a portion of the data below. The data is in file1.txt. I would like to move the data from file1.txt to file2.txt. But, I only want to keep the numbers.

Eg: I want row 1 to look like this:

1 1549367 11 8 3 11 0 -12.00 6.00 -0.25 -3.00 0.00 -1.67 -12.00 6.00 -0.64

Instead of this:

1 Chr26 1549367 11 GGGGGGGAAGA 8 3 Transition 11 0 -12.00 6.00 -0.25 -3.00 0.00 -1.67 -12.00 6.00 -0.64

This is what I have done so far (file1.txt will be in @ARGV):

open FILE2, "+>file2.txt" or die "Cant not open file2.txt!"; my $line; while($line = readline(ARGV)) { print FILE2 $line; }

The code above only copies content of file1.txt (ARGV) into file2.txt.

I tried to use ‘seek’ and ‘tell()’ but, to solve my problem above but, I got confused :(

I also tried this:

Open(FILE, “file1.txt”) @theFile = <FILE>;

This puts every row in the array @the File. But, I can I now modify the elements of one row? (I’m still a novice Perl programmer)

Thank you for your help

/………………………………………………………………………………………../

The file portion

1 Chr26 1549367 11 GGGGGGGAAGA 8 3 Transition 11 0 -12.00 6.00 -0.25 -3.00 0.00 -1.67 -12.00 6.00 -0.64 1 Chr26 1549501 15 ccCctctccccctCC 12 3 Transition 3 12 -17.00 6.00 0.50 1.00 6.00 2.67 -17.00 6.00 0.93 1 Chr26 1549552 14 AagAAaaAAAagga 11 3 Transition 6 8 -31.00 6.00 -2.09 -12.00 3.00 -5.67 -31.00 6.00 -2.86 1 Chr26 1549563 14 tAAaaAAAattat^Ft 9 5 Transversion 5 9 -7.00 6.00 0.22 -64.00 4.00 -18.40 -64.00 6.00 -6.43 1 Chr26 1549726 14 TtTtctTtTtTTTT 13 1 Transition 8 6 -3.00 6.00 1.92 6.00 6.00 6.00 -3.00 6.00 2.21 2 Chr26 1549737 16 T+1Atttt+1aT+1At+1aTt+1aT+1AT+1AT+1AT+1AtT+1A^FA 15 11 Transversion 16 10 -64.00 6.00 -35.67 -64.00 6.00 -46.18 -64.00 6.00 -40.12 2 Chr26 1549815 9 CtCTTTTTT 7 2 Transition 8 1 -3.00 6.00 -0.14 -9.00 0.00 -4.50 -9.00 6.00 -1.11 1 Chr26 1549914 12 gGGGGGGGAGgg 11 1 Transition 9 3 -9.00 6.00 1.18 -4.00 -4.00 -4.00 -9.00 6.00 0.75 1 Chr26 1550018

Comment on How to parse this file
Select or Download Code
Re: How to parse this file
by Anonymous Monk on Oct 19, 2011 at 13:55 UTC
    This ought to do it
    perl -Tpe " s/\w\S+// " < infile > outfile
Re: How to parse this file (repost)
by tye (Cardinal) on Oct 19, 2011 at 14:16 UTC
Re: How to parse this file
by roboticus (Canon) on Oct 19, 2011 at 14:53 UTC

    hervebags:

    I'd do it something[1] like this:

    my @cols_to_keep = (0, 2, 3, 5, 6, 8 .. 18); while (my $line = readline(ARGV)) { my @fields = split /\s+/, $line; print FILE2 join(" ", @fields[@cols_to_keep]), "\n"; }

    [1] i.e., untested, etc.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: How to parse this file
by Sewi (Friar) on Oct 19, 2011 at 15:02 UTC

    Pointing someone into the right direction is better than providing the solution without explanation.

    seek and tell are working on the filehandle, you could easily work on a variable after reading the line by using "substr" and maybe "length".

    But most of the string manipulation in Perl is done using Perl regular expressions. Don't compare them to POSIX, Javascript or other regular expressions, all of them only have a small subset of the Perl RE's power.

    \d is a char class for the numbers from 0 to 9, a basic regex for converting your line might be$line =~ s/[^\d]//g;It won't exactly do what you want (cut out too much), but it should be a good start for you. Try to use the Perl regex documentation to understand the expression shown above. Changing them to something really fitting your needs should be easy once you understood the regex.

    For gurus: Yes, there is \D, but ^\d is a better learning start for this problem.

Re: How to parse this file
by jwkrahn (Monsignor) on Oct 19, 2011 at 17:03 UTC
    #!/usr/bin/perl use warnings; use strict; use Scalar::Util qw/ looks_like_number /; my ( $file_in, $file_out ) = ( 'file1.txt', 'file2.txt' ); open my $IN, '<', $file_in or die "Cannot open '$file_in' because: $ +!"; open my $OUT, '>', $file_out or die "Cannot open '$file_out' because: +$!"; while ( <$IN> ) { print $OUT join( "\t", grep looks_like_number( $_ ), split ), "\n" +; } __END__
Re: How to parse this file
by Caio (Acolyte) on Oct 19, 2011 at 18:36 UTC
    Ok, I'm in a lazy mood to type code right now, but in the spirit of TIMTOWTDI, tell me what you guys think of the following idea:
    Instead of doing a "constructive parsing" do a "destructive" one?

    Pseudo-code would be something like:
    open infile open outfile foreach (<inputfile>){ $_ =~ s/[letters]//g $_ =~ s/\s{2}//g #remove excess whitespace print $_ outfile } close infile close outfile


    Just my 2cents anyway, what say you fellow monks?

    UPDATE: Just entered and read tye's link, that should teach me to actualy read the link other monk's post answering questions before i rush into answering them myself. =/
Re: How to parse this file
by GrandFather (Cardinal) on Oct 19, 2011 at 21:15 UTC

    Perl is tuned for parsing. To solve your problem you need to split each line into portions, throw away unwanted stuff, then join the remaining portions together again and print them. Consider:

    use warnings; use strict; die <<HELP if ! @ARGV; parse.pl <infile name> <infile name> source file to be processed Output is printed to stdout HELP while (<>) { chomp; print join ' ', grep {/^[+-]?\d+(\.\d*)?$/} split; print "\n"; }

    given your sample data in a file passed on the command line prints:

    1 1549367 11 8 3 11 0 -12.00 6.00 -0.25 -3.00 0.00 -1.67 -12.00 6.00 - +0.64 1 1549501 15 12 3 3 12 -17.00 6.00 0.50 1.00 6.00 2.67 -17.00 6.00 0.9 +3 1 1549552 14 11 3 6 8 -31.00 6.00 -2.09 -12.00 3.00 -5.67 -31.00 6.00 +-2.86 1 1549563 14 9 5 5 9 -7.00 6.00 0.22 -64.00 4.00 -18.40 -64.00 6.00 -6 +.43 1 1549726 14 13 1 8 6 -3.00 6.00 1.92 6.00 6.00 6.00 -3.00 6.00 2.21 2 1549737 16 15 11 16 10 -64.00 6.00 -35.67 -64.00 6.00 -46.18 -64.00 +6.00 -40.12 2 1549815 9 7 2 8 1 -3.00 6.00 -0.14 -9.00 0.00 -4.50 -9.00 6.00 -1.11 1 1549914 12 11 1 9 3 -9.00 6.00 1.18 -4.00 -4.00 -4.00 -9.00 6.00 0.7 +5 1 1550018

    There are a number of important parts there: split, grep, the regular expression in the grep (see perlretut) and join. I strongly recommend that you read the documentation for each of those.

    The other bit of magic is <> which reads a line at a time from stdin if there are no command line arguments, or from the files whose names are passed on the command line. Actually without the "helpfull" die the script is even more useful because it can be used as a filter accepting piped input on stdin, or by taking a file (or list of files) on the command line.

    True laziness is hard work

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://932408]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2014-04-20 10:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls