Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

perl one liner for csv file one field

by MrTEE (Novice)
on Jan 17, 2015 at 04:55 UTC ( #1113573=perlquestion: print w/replies, xml ) Need Help??

MrTEE has asked for the wisdom of the Perl Monks concerning the following question:

I know this is a Perl site, but i need help with a one liner. I have a little bash script that cats out a file and tells me if there is a line where the 11th column has more than 6 characters in it. It emails me where there is a bad line in a file - bead meaning that it will break a downstream process. anyhow when i get the email saying that there is a bad file i just log in to the pc via vpn and the I sed out the lines from the file that I get in the email. The bad lines are always in danny.csv not danny1.csv It has been the same lines killing the downstream process for a few weeks, so i put the "sed -i's" into the script and it does it automagically.

for i in danny.csv danny1.csv do cat /come/and/play/with/$i | perl -ne 'print if length((split /,/)[10 +]) > 6' | mail -s "danny.csv bad line" casper@casperr.com done #it would be nice to find a perl change the file in place sed -i '/D,642,0642,UBF,EVL,,M,,S,S,FOREVER,213,213,/d' /come/and/pla +y/with/us/danny.csv sed -i '/D,642,0642,UBF,EVL,,M,,S,S,QSP-U=C,4,4,/d' /come/and/play/wi +th/us/danny.csv

However when a new line gets put into this file, I am going to have to log in and take out the line. SO I have been trying to write a perl one liner that will edit the file in place, like sed, and make a backup of the file. I just need a perl one liner that will delete any line where the 11th columns has more than 6 characters in it.

perl -p -i.bak -e 's/\,\w{7}\,//g - which does not work.

I tried something like this:

perl -nle 'print if /\,\w{7}\,/' /come/and/play/with/us/danny.csv

but that does not catch the QSP-U=C and it catches more lines than just the FOREVER. for a solutinog I need to focus on the the 11th column.

Replies are listed 'Best First'.
Re: perl one liner for csv file one field
by eyepopslikeamosquito (Bishop) on Jan 17, 2015 at 05:49 UTC

    cat /come/and/play/with/$i | perl -ne 'print if length((split /,/)[10] +) > 6'
    BTW, I cannot restrain myself from commenting anytime I see cat being used like this because, long ago on usenet, Tom Christiansen wrote: "If you find yourself calling cat with just one argument, you're probably doing something silly".

    Following Tom's advice, I suggest you lose the cat (and the unnecessary pipe) by replacing:

    cat /come/and/play/with/$i | perl -ne 'print if length((split /,/)[10] +) > 6'
    with simply:
    perl -ne 'print if length((split /,/)[10]) > 6' /come/and/play/with/$i

    Update: Added content from Useless Use of Cat Award (thanks choroba):

    The venerable Randal L. Schwartz hands out Useless Use of Cat Awards from time to time; you can see some recent examples in Deja News. (The subject line really says "This Week's Useless Use of Cat Award" although the postings are a lot less frequent than that nowadays). The actual award text is basically the same each time, and the ensuing discussion is usually just as uninteresting, but there are some refreshing threads there among all the flogging of this dead horse. The oldest article Deja News finds is from 1995, but it's actually a followup to an earlier article. By Internet standards, this is thus an Ancient Tradition.

    Nearly all cases where you have:

    cat file | some_command and its args ...
    you can rewrite it as:
    <file some_command and its args ...
    and in some cases you can move the filename to the arglist as in:
    some_command and its args ... file

    Also mentioned at this site are:

    • Useless Use of kill -9
    • Useless Use of echo
    • Useless Use of ls *b
    • Useless Use of wc -l
    • Useless Use of grep | awk and grep | sed
    • Useless Use of Backticks
    • Useless Use of Test
    • Assorted Other Gripes: Regular expressions used for searching (not substituting) that begin or end with '.*'. Actually 'anything*' or 'anything?'. If you are willing to accept "zero repetitions" of the anything, why specify it? Awk scripts that are basically cut unless reordering of fields is needed. Case conversions in comp.unix.shell (ex. how do I change my file names from UC to LC?) using tr/sed/awk/??? when some shells have builtin case conversions. Complex schemes to basically eliminate certain chars. For example DOS lines to UNIX lines. Sure read dos2unix(1), but using sed/awk/... when "tr -d '^M'" is all that is needed. Global changes to a file using sed to create a tmp file and renaming the tmp file, when an "ed(1)" here document would do fine.

    Update: See also:

Re: perl one liner for csv file one field
by eyepopslikeamosquito (Bishop) on Jan 17, 2015 at 05:33 UTC

    Here is an example input file test.txt:

    D,642,0642,UBF,EVL,,M,,S,S,FOREVER,213,213, D,642,0642,UBF,EVL,,M,,S,S,QSP-U=C,4,4, D,642,0642,UBF,EVL,,M,,S,S,123456,4,4, D,642,0642,UBF,EVL,,M,,S,S,12345,4,4,
    where the fields are separated by a comma. For example, FOREVER above is the eleventh field. This is just a guess based on your description. Please correct me if this is wrong.

    Given the above assumption, the following one-liner:

    perl -nlaF/,/ -e 'length($F[10]) < 7 and print' test.txt
    prints to stdout:
    D,642,0642,UBF,EVL,,M,,S,S,123456,4,4, D,642,0642,UBF,EVL,,M,,S,S,12345,4,4,
    i.e. prints out only those lines where the eleventh field is less than seven characters in length.

    See perlrun for details of the -a and -F command line switches to perl.

    Once you are happy that works and meets the spec. you could add the -i switch to auto-edit the file.

      the following one-liner:

      perl -nlaF/,/ -e 'length($F[10]) < 7 and print' test.csv

      Is parsing csv with a split not a crime in the same category as parsing html with regexes?

      After all the fields might contain escaped delimiters...

      Here another attempt using Text::CSV_XS that processes the file line-by-line (Tux's solution slurps with my not be good for very large files):

      perl -MText::CSV_XS=csv -ne 'print if length(csv(in => \$_)->[0]->[10]) > 6'

        The new filter option filters on read, so large files are no problem. I like your approach, but it will FAIL on lines with embedded newlines, as the next csv iteration will not continue from the previous.


        Enjoy, Have FUN! H.Merijn
Re: perl one liner for csv file one field
by Tux (Canon) on Jan 17, 2015 at 09:43 UTC

    OK, I'll bite. Using the answers below, here's a one-liner with Text::CSV_XS:

    $ perl -MText::CSV_XS=csv -e'csv(in=>[grep{length($_->[10])>6}@{csv(in +=>"test.csv")}])' D,642,0642,UBF,EVL,,M,,S,S,FOREVER,213,213, D,642,0642,UBF,EVL,,M,,S,S,QSP-U=C,4,4, $

    Enjoy, Have FUN! H.Merijn
Re: perl one liner for csv file one field
by pme (Monsignor) on Jan 17, 2015 at 05:30 UTC
    Hi MrTEE,
    perl -F, -i.orig -ane'print if length($F[10]) <= 6;'
    You can read more about autosplit feature in perlrun and perlvar manuals.
Re: perl one liner for csv file one field
by Tux (Canon) on Jan 17, 2015 at 19:16 UTC

    Your case inspired my to add a filter option to Text::CSV_XS's csv function:

    $ perl -MCSV -e'csv(in=>"test.csv",filter=>{11=>sub{length>6}})' D,642,0642,UBF,EVL,,M,,S,S,FOREVER,213,213, D,642,0642,UBF,EVL,,M,,S,S,QSP-U=C,4,4, $

    It still needs some shaving, tests and docs, but looks promising and useful. It'll be included in the next release.


    Enjoy, Have FUN! H.Merijn
Re: perl one liner for csv file one field
by Anonymous Monk on Jan 17, 2015 at 07:31 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1113573]
Approved by GrandFather
Front-paged by blindluke
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2021-07-31 09:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?