Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^2: selecting columns from a tab-separated-values file

by Lotus1 (Vicar)
on Jan 22, 2013 at 01:49 UTC ( [id://1014530]=note: print w/replies, xml ) Need Help??


in reply to Re: selecting columns from a tab-separated-values file
in thread selecting columns from a tab-separated-values file

Very cool command, thanks for posting it.

I just tried it and noticed that it prints out the fields 3,1,6 but in the order 1,3,6. It went through a file with 10^6 lines in about 7 seconds and 2*10^6 lines in 10 seconds. So 10^9 lines should take about an hour. My test data only had 9 fields not 50.

I'm running on Windows XP by the way with 8 cores. I Installed GNU textutils a long time ago and am always surprised when I find out about these things I have but don't how to use.

Replies are listed 'Best First'.
Re^3: selecting columns from a tab-separated-values file
by spacebar (Beadle) on Jan 22, 2013 at 03:05 UTC
    You can't reorder the output fields with 'cut', but if you have 'sed' you can do this:
    $ cat t FIRST MIDDLE LAST STRNO STRNAME CITY STATE ZIP $ sed -n 's/\(.*\t\).*\t\(.*\t\).*\t.*\t\(.*\t\).*\t.*/\2\1\3/p' t LAST FIRST CITY
      You can't reorder the output fields with 'cut'

      Isn't that what I just said? Maybe you intended to reply to mildside.

      ...if you have 'sed' you can do this...

      I have sed but I prefer Perl.

      update: So I couldn't resist trying this sed command and found that it works for the input provided but as soon as you add more fields at the end it breaks.

      Given this input file:

      FIRST MIDDLE LAST STRNO CITY STATE ZIP 1 2 + 3 4 5 FIRST MIDDLE LAST STRNO CITY STATE ZIP 1 2 + 3 4 FIRST MIDDLE LAST STRNO CITY STATE ZIP

      You get this output:

      ZIP FIRST MIDDLE LAST STRNO CITY 3 STATE FIRST MIDDLE LAST STRNO 2 LAST FIRST CITY

      The greedy '.*' regex expressions cause the regex engine to match from the right and work back. '\1' ends up holding everything on the left that remains unmatched. For the first line \1 holds FIRST    MIDDLE    LAST    STRNO        CITY.

      Here is a version that works.

      C:\b\perlmonks\commands>sed -n "s/^\([^\t]*\t\)[^\t]*\t\([^\t]*\t\)[^\ +t]*\t[^\t]*\t\([^\t]*\).*/\2\1\3/p" sedtest.csv LAST FIRST CITY LAST FIRST CITY LAST FIRST CITY

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1014530]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-03-19 03:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found