MrTEE has asked for the wisdom of the Perl Monks concerning the following question:
I know this is a Perl site, but i need help with a one liner. I have a little bash script that cats out a file and tells me if there is a line
where the 11th column has more than 6 characters in it.
It emails me where there is a bad line in a file - bead meaning that it will break a
downstream process.
anyhow when i get the email saying that there is a bad file i just log in to the pc via
vpn and the I sed out the lines from the file that I get in the email. The bad lines are
always in danny.csv not danny1.csv
It has been the same lines killing the downstream process for a few weeks, so i put the "sed -i's" into
the script and it does it automagically.
for i in danny.csv danny1.csv
do
cat /come/and/play/with/$i | perl -ne 'print if length((split /,/)[10
+]) > 6' | mail -s "danny.csv bad line" casper@casperr.com
done
#it would be nice to find a perl change the file in place
sed -i '/D,642,0642,UBF,EVL,,M,,S,S,FOREVER,213,213,/d' /come/and/pla
+y/with/us/danny.csv
sed -i '/D,642,0642,UBF,EVL,,M,,S,S,QSP-U=C,4,4,/d' /come/and/play/wi
+th/us/danny.csv
However when a new line gets put into this file, I am going to have to log in and take out the line.
SO I have been trying to write a perl one liner that will edit the file in place, like sed, and make a
backup of the file. I just need a perl one liner that will delete any line where the 11th columns has more
than 6 characters in it.
perl -p -i.bak -e 's/\,\w{7}\,//g - which does not work.
I tried something like this:
perl -nle 'print if /\,\w{7}\,/' /come/and/play/with/us/danny.csv
but that does not catch the QSP-U=C and it catches more lines than just the
FOREVER. for a solutinog I need to focus on the the 11th column.
Re: perl one liner for csv file one field (useless use of cat and other awards)
by eyepopslikeamosquito (Bishop) on Jan 17, 2015 at 05:49 UTC
|
cat /come/and/play/with/$i | perl -ne 'print if length((split /,/)[10]
+) > 6'
BTW, I cannot restrain myself from commenting anytime I see
cat being used like this
because, long ago on usenet, Tom Christiansen wrote:
"If you find yourself calling cat with just one argument,
you're probably doing something silly".
Following Tom's advice, I suggest you lose the cat (and the unnecessary pipe) by replacing:
cat /come/and/play/with/$i | perl -ne 'print if length((split /,/)[10]
+) > 6'
with simply:
perl -ne 'print if length((split /,/)[10]) > 6' /come/and/play/with/$i
Update: Added content from Useless Use of Cat Award (thanks choroba):
The venerable Randal L. Schwartz hands out Useless Use of Cat Awards from time to time; you can see some recent examples in Deja News.
(The subject line really says "This Week's Useless Use of Cat Award" although the postings are a lot less frequent than that nowadays).
The actual award text is basically the same each time, and the ensuing discussion is usually just as uninteresting, but there are some
refreshing threads there among all the flogging of this dead horse.
The oldest article Deja News finds is from 1995, but it's actually a followup to an earlier article.
By Internet standards, this is thus an Ancient Tradition.
Nearly all cases where you have:
cat file | some_command and its args ...
you can rewrite it as:
<file some_command and its args ...
and in some cases you can move the filename to the arglist as in:
some_command and its args ... file
Also mentioned at this site are:
- Useless Use of kill -9
- Useless Use of echo
- Useless Use of ls *b
- Useless Use of wc -l
- Useless Use of grep | awk and grep | sed
- Useless Use of Backticks
- Useless Use of Test
- Assorted Other Gripes: Regular expressions used for searching (not substituting) that begin or end with '.*'. Actually 'anything*' or 'anything?'. If you are willing to accept "zero repetitions" of the anything, why specify it?
Awk scripts that are basically cut unless reordering of fields is needed. Case conversions in comp.unix.shell (ex. how do I change my file names from UC to LC?) using tr/sed/awk/??? when some shells have builtin case conversions. Complex schemes to basically eliminate certain chars. For example DOS lines to UNIX lines. Sure read dos2unix(1), but using sed/awk/... when "tr -d '^M'" is all that is needed.
Global changes to a file using sed to create a tmp file and renaming the tmp file, when an "ed(1)" here document would do fine.
Update: See also:
| [reply] [d/l] [select] |
Re: perl one liner for csv file one field
by eyepopslikeamosquito (Bishop) on Jan 17, 2015 at 05:33 UTC
|
D,642,0642,UBF,EVL,,M,,S,S,FOREVER,213,213,
D,642,0642,UBF,EVL,,M,,S,S,QSP-U=C,4,4,
D,642,0642,UBF,EVL,,M,,S,S,123456,4,4,
D,642,0642,UBF,EVL,,M,,S,S,12345,4,4,
where the fields are separated by a comma.
For example, FOREVER above is the eleventh field.
This is just a guess based on your description.
Please correct me if this is wrong.
Given the above assumption, the following one-liner:
perl -nlaF/,/ -e 'length($F[10]) < 7 and print' test.txt
prints to stdout:
D,642,0642,UBF,EVL,,M,,S,S,123456,4,4,
D,642,0642,UBF,EVL,,M,,S,S,12345,4,4,
i.e. prints out only those lines where the eleventh
field is less than seven characters in length.
See perlrun for details of the -a and -F
command line switches to perl.
Once you are happy that works and meets the spec. you could
add the -i switch to auto-edit the file.
| [reply] [d/l] [select] |
|
the following one-liner:
perl -nlaF/,/ -e 'length($F[10]) < 7 and print' test.csv
Is parsing csv with a split not a crime in the same category as parsing html with regexes?
After all the fields might contain escaped delimiters...
Here another attempt using Text::CSV_XS that processes the file line-by-line (Tux's solution slurps with my not be good for very large files):
perl -MText::CSV_XS=csv -ne 'print if length(csv(in => \$_)->[0]->[10]) > 6'
| [reply] [d/l] [select] |
|
The new filter option filters on read, so large files are no problem. I like your approach, but it will FAIL on lines with embedded newlines, as the next csv iteration will not continue from the previous.
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
Re: perl one liner for csv file one field
by Tux (Canon) on Jan 17, 2015 at 09:43 UTC
|
OK, I'll bite. Using the answers below, here's a one-liner with Text::CSV_XS:
$ perl -MText::CSV_XS=csv -e'csv(in=>[grep{length($_->[10])>6}@{csv(in
+=>"test.csv")}])'
D,642,0642,UBF,EVL,,M,,S,S,FOREVER,213,213,
D,642,0642,UBF,EVL,,M,,S,S,QSP-U=C,4,4,
$
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
Re: perl one liner for csv file one field
by pme (Monsignor) on Jan 17, 2015 at 05:30 UTC
|
perl -F, -i.orig -ane'print if length($F[10]) <= 6;'
You can read more about autosplit feature in perlrun and perlvar manuals. | [reply] [d/l] |
Re: perl one liner for csv file one field
by Tux (Canon) on Jan 17, 2015 at 19:16 UTC
|
$ perl -MCSV -e'csv(in=>"test.csv",filter=>{11=>sub{length>6}})'
D,642,0642,UBF,EVL,,M,,S,S,FOREVER,213,213,
D,642,0642,UBF,EVL,,M,,S,S,QSP-U=C,4,4,
$
It still needs some shaving, tests and docs, but looks promising and useful. It'll be included in the next release.
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
Re: perl one liner for csv file one field
by Anonymous Monk on Jan 17, 2015 at 07:31 UTC
|
| [reply] |
|
|