Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: selecting columns from a tab-separated-values file

by Kenosis (Priest)
on Jan 22, 2013 at 04:50 UTC ( #1014569=note: print w/ replies, xml ) Need Help??


in reply to Re^2: selecting columns from a tab-separated-values file
in thread selecting columns from a tab-separated-values file

What's the first arg to split()? It appears to be a single blank char. How does that work to split upon tab chars?

It's a space enclosed within single quotes. It tells split to split on whitespace, e.g., \t, \n, space.

In trySplitSliceLimit, wouldn't it be better to set LIMIT to 3, or in general to the number of fields you expect to extract?

It should be set to the number of fields plus one that are needed to get the fields you want. For example, using your original string:

"FIRST\tMIDDLE\tLAST\tSTRNO\tSTRNAME\tCITY\tSTATE\tZIP" 1 2 3 4 5 6 7 ----> my @capture = ( split /\t/, $line, 7 )[ 2, 0, 5 ];

You want LAST FIRST CITY and CITY is the sixth field. Setting the LIMIT to seven will return the first six fields and the remainder of the string is the seventh. The slice is then used on those seven to get only the three you want.

And one observation for the record: the indices can appear in any order. To extract LAST, FIRST, and CITY, you'd write [2, 0, 5]

You're correct!

Update: Changed splitting on ' ' to \t. Thanks CountZero.


Comment on Re^3: selecting columns from a tab-separated-values file
Select or Download Code
Re^4: selecting columns from a tab-separated-values file
by CountZero (Bishop) on Jan 22, 2013 at 07:28 UTC
    What's the first arg to split()? It appears to be a single blank char. How does that work to split upon tab chars?
    It's a space enclosed within single quotes. It tells split to split on whitespace, e.g., \t, \n, space.
    That is a dangerous thing to do. it works for your example data, but it will break on real word data where you will have LAST names like Van Winkle and CITYs called New York.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

      You're quite right and this has been corrected. Appreciate you catching this.

Re^4: selecting columns from a tab-separated-values file
by Kenosis (Priest) on Jan 22, 2013 at 07:55 UTC

    Deleted--replied to myself. Time for sleep...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1014569]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2014-08-31 04:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (294 votes), past polls