Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Splitting on tabs then removing extra white space with map

by c4onastick (Friar)
on Sep 18, 2007 at 01:58 UTC ( #639532=perlquestion: print w/ replies, xml ) Need Help??
c4onastick has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks,

I'm parsing a file that's tab-delimited, except it has a bunch of extra white space around the values that I'd like to remove. I can't just use split on white space, because there's mixed values (some floats, that I'd like to remove the white space from, and some strings that can have white space).

My first attempt was this:

#!/bin/perl use warnings; use strict; use Data::Dump qw(pp); while(<DATA>){ my @points = map{ s/\s+// } split("\t", $_); print "\n\@points =\n", pp \@points; #More code here } __DATA__ 0.000 12 0.232 13 11 text that c +an have space 1.000 13 0.534 14 12 More text t +hat would be ok 2.000 14 0.876 15 13 yet more te +xt

But I get this:

@points = [1] @points = [1] @points = [1]

Which definitely makes me think I have a context error going. What's the appropriate way to do this? (I know that the s/\s+// will remove the white space in the text too, I'm ok with that for now, I'd like to remove it from around the floats first.)

Effective use of map has always been my Everest, and with your help I will summit it this time!


Thanks in advance for your wisdom!

Comment on Splitting on tabs then removing extra white space with map
Select or Download Code
Re: Splitting on tabs then removing extra white space with map
by GrandFather (Cardinal) on Sep 18, 2007 at 02:24 UTC

    The general rule when it comes to parsing character separated data is: Don't. Use a module instead. In this case Text::xSV is most likely appropriate. Consider:

    use strict; use warnings; use Text::xSV; my $xsv = Text::xSV->new (sep => "\t", fh => *DATA); while (my @row = $xsv->get_row ()) { map {s/\s+$//; s/^\s+//} @row; print ">", join ("< >", @row), "<\n"; } __DATA__ 0.000 12 0.232 13 11 text that can have space 1.000 13 0.534 14 12 More text that would be ok 2.000 14 0.876 15 13 yet more text

    Prints:

    >0.000< >12< >0.232< >13< >11< >text that can have space< >1.000< >13< >0.534< >14< >12< >More text that would be ok< >2.000< >14< >0.876< >15< >13< >yet more text<

    Update: altered code to use __DATA__ rather than heredoc.


    DWIM is Perl's answer to Gödel
Re: Splitting on tabs then removing extra white space with map
by mhearse (Hermit) on Sep 18, 2007 at 02:29 UTC
    #!/usr/bin/perl use strict; my @array; while (my $line = <DATA>) { chomp $line; my @points = map{ s/\s+// } split /\t/, $line; push @array, \@points; }
Re: Splitting on tabs then removing extra white space with map
by jwkrahn (Monsignor) on Sep 18, 2007 at 03:04 UTC
    while ( <DATA> ) { s/\A\s+//; # Remove leading and s/\s+\z//; # trailing whitespace my @points = split /\s*\t\s*/; # split on tabs and surrounding wh +itespace print "\n\@points =\n", pp \@points; #More code here }
Re: Splitting on tabs then removing extra white space with map
by djp (Hermit) on Sep 18, 2007 at 03:39 UTC
    Recommend Text-CSV_XS for functionality and speed, if you don't mind an XS solution.
Re: Splitting on tabs then removing extra white space with map
by ikegami (Pope) on Sep 18, 2007 at 04:16 UTC

    Since nobody mentioned it, the problem is that the return value of s/// isn't the modified string.

    my @points = map { s/\s+//; $_ } split(/\t/, $_);

    But then you're modifying map's arguments without intending to do so. Modifying $_ in map is a bad idea. (If you really do mean to modify map's arguments, using for would make your intent more obvious.)

    my @points = map { local $_ = $_; s/\s+//; $_ } split(/\t/, $_);

    Yuck! List::MoreUtils provides a function for just this purpose. It even uses a better method of localizing $_.

    use List::MoreUtils qw( apply ); my @points = apply { s/\s+// } split(/\t/, $_);

      Ah! Thank you! That makes much more sense. I tried every context permutation between map and split I could think of.

      Modifying $_ in map is a bad idea.

      I though this was one of map's strengths? Am I mistaken?

        In what situation would it be a strength? map is usually used as
        my @new = map { ... } @old;

        By changing $_, you *also* change @old in non-obvious manner.

Re: Splitting on tabs then removing extra white space with map
by bruceb3 (Pilgrim) on Sep 18, 2007 at 06:08 UTC
    The data that is being parsed seems to be made up of 6 fields. There is a 3rd option to split, which is the number of fields to return. The follow code is simple and doesn't require an external module and based on the data given in the example, produces the desired output.
    #!/usr/bin/perl use warnings; use strict; use Data::Dump qw(pp); while(<DATA>){ s/^\s+//; chomp; my @points = split(/\s+/, $_, 6); print "\n\@points =\n", pp \@points; #More code here } __DATA__ 0.000 12 0.232 13 11 text that c +an have space 1.000 13 0.534 14 12 More text t +hat would be ok 2.000 14 0.876 15 13 yet more te +xt

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://639532]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (10)
As of 2014-07-11 07:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (220 votes), past polls