Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw


by perlneedhelp2012 (Initiate)
on Sep 26, 2012 at 05:07 UTC ( #995703=perlquestion: print w/replies, xml ) Need Help??
perlneedhelp2012 has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/local/bin/perl print "hello\n"; open(FILE1, "<file1.txt") or die "cannot open: $!\n"; open(OUTFO1, ">outfile3.txt") or die "cannot open: $!\n"; use strict; use warnings; my @data1 = <FILE1>; print "hello\n"; print scalar(@data1); my @data2 = <FILE2>; print "hello\n"; my @ra = map{(split)[1]}split/\t/, @data1; print "hello\n"; print scalar(@ra); #ERROR: LENGTH OF @ra SHOWING 0 print OUTFO1 "@ra"; #ERROR: OUTPUT FILE IS CREATED BUT IS EMPTY

Hello, the above is my code. My input file is a multicolumn tab delimited data file. I want to extract first column and output it in another file.The output file is generated but is empty. Please help me out to find my flaws. Thank you.

Replies are listed 'Best First'.
by davido (Archbishop) on Sep 26, 2012 at 05:34 UTC

    Look closely at this line:

    my @ra = map { ( split )[1] } split /\t/, @data1;

    Start from the right hand side. The second argument to split needs to be a string of some sort, not an array.

    Now move left.... Do you really intend to split on tabs, and then further split each tab-delimited column on a single space? (Because that's what's happening there.)

    I suspect what you want is something more like this:

    my @ra = map { (split /\t/, $_, 2)[0] } @data1

    But I'm not positive, because I haven't seen your data. Next, why are you reading <FILE2> into @data2 without ever having called open on that filehandle? Also, you should be explicitly closing your output file, and using the or die construct again to verify that the close didn't fail.


by 2teez (Priest) on Sep 26, 2012 at 09:57 UTC

    Please help me out to find my flaws

    Please consider the following:

    • It's a lot better, to use warnings; and strict; at the top of ones script
    • It's modern ( ha ha... ) and a lot safer to use open 3 arugments
    • Barewords as filehandles might not be too good, you can use lexical variable of your choice.
    • Lastly, you will want to close, your filehandles, as soon as they are opened

    Below is the reconstruct of your script.
    Please, note I used regex to get what I assumed to be first column of your data used.
    Using split or even substr might probabily be better, if OPs data is known. I can only assume data here.
    use warnings; use strict; open my $fh, '>', 'outfile3.txt' or die "can't open file: $!"; open my $fh2, '<', 'file1.txt' or die "can't open file: $!"; while ( defined( my $line = <$fh2> ) ) { if ( $line =~ m{^(?<matched>.+?)\s+?.*?$}){ print {$fh} $+{matched}, $/; } } close $fh2 or die "can't close file: $!"; close $fh or die "can't close file: $!";
    Hope this helps.

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
by Marshall (Abbot) on Sep 26, 2012 at 11:40 UTC
    A tab delimited file is one of the most horrific file formats that I could imagine. I cannot think of something worse than this. It is so hard that I won't even try to generate a tab delimited file, because my text editor just doesn't like to do that. But if you absolutely had to do that, the idea is shown below...

    NEVER, NEVER EVER use a tab delimited file yourself - this is nasty stuff!
    If you have fixed space fonts for this, then you cannot tell by just looking at this whether this is just spaces or even if there is a tab character in these lines!

    123 aBVXC SAOMEWTRINOGN ABC 876 AsrdaDS some_bs 564 37897654 aofruafdouf abc <c> <c> #!/usr/bin/perl -w use strict; open (IN, '<', "tab_file.txt") or die "$!"; while (<IN>) { my ($first_token) = split (/\s/, $_); #should be(/\t/, $_) print $first_token,"\n"; } __END__ 123 876 37897654 tab_file.txt: (not really tabs)... 123 aBVXC SAOMEWTRINOGN ABC 876 AsrdaDS some_bs 564 37897654 aofruafdouf abc
    When confronted with a tab delimted file, I would think about s/\t/|/g; or the tr equivalent! The '|' character is is just a FAR, FAR better field delimiter than a tab. Many Databases are done this way. Second choice would be a CSV format. A tab delimited file just has all things bad going for it - sorry if you have to deal with one of these things. Don't make one yourself!

      Huh? Tab, the character which was included in ASCII specifically for aligning tabular data is a "horrific" way of representing tabular data?

      Personally I find tab-delimited files to be very easy to deal with. With CSV data, the fields will often contain commas (addresses fields; some date formats and numeric formats) necessitating ways of "escaping" commas which vary between software packages. It is quite common to be in situations where you know that the fields themselves cannot contain \t or \n; and in those cases tab delimited data is a joy to work with. Want to slurp your data into a multi-dimensional array?

      my @data = map { chomp; split /\t/ } <$fh>;

      I'd suggest that if your text editor makes distinguishing between tabs and spaces difficult, then you should investigate other text editors.

      (Aside: yes, there are some very good CSV parsers for Perl which abstract away the nits when dealing with CSV. When you have to work with CSV in other programming languages you appreciate what a good job they do.)

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        The problem with a tab delimited file is that the tabs are hard to see in a normal text editor. Is that '" "\t"' or '" "\t" or whatever?

        So the basic problem is that tabs are not easily "visible". My programming editor also converts "tabs" to "spaces" when I write a program file. No program file that I work with has tab characters in it. When I "save it" all the tabs disappear.

        There is not a "standard" for the number of spaces for a tab character. In the "olden days", this made a difference because it saved disk space. This makes no difference now. Or in a practical sense, the space saving makes no difference. And it is "hard to read" the output.

        Many of the DB output formats that I work with use "|" as the field separator. That is not a valid character for a name or an address. This works well for many types of DB fields that you might want to import/export and you can just use a simple split() for input. Perl has a number of .CSV parsers and they do work very, very well. That is another option.

        This tab idea is a problem because it is hard to see! Yes, I can deal with it and I can set editor settings to allow me to see the difference between 2 spaces versus one space and tab, but this is a hassle.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://995703]
Approved by davido
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2017-05-27 11:59 GMT
Find Nodes?
    Voting Booth?