Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Convert XLSX to TSV and remove CRLF in cells

by pme (Prior)
on Jun 16, 2015 at 11:52 UTC ( #1130589=note: print w/replies, xml ) Need Help??


in reply to Convert XLSX to TSV and remove CRLF in cells

The embedded carriage returns can be replaced with '\n' using this simple script on the Red Hat box:
use strict; use warnings; while (<>) { chomp; if (/^M$/) { print "$_\n"; } else { print "$_\\n"; } }
'^M' is single character, entered pressing 'ctrl-v enter'.

Update: The direct conversion does not seem to be hopeless. You can simply omit the $converter if you do not need encoding conversion. I created an xlsx file with ~1.000.000 rows and with only two columns (file size ~10Mb) and it was converted to csv in 140sec.

Replies are listed 'Best First'.
Re^2: Convert XLSX to TSV and remove CRLF in cells
by MidLifeXis (Monsignor) on Jun 16, 2015 at 12:16 UTC

    '^M' is single character, entered pressing 'ctrl-v enter'
    in some editors. In others is may end up displaying as some sort of a line feed. [emphasis added]

    A better, more portable way of encoding this is \015, \o{015}, \cM, \x0d, or some other encoded form that won't potentially be messed up by an editor, printer, code pretty-printer, ….

    --MidLifeXis

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1130589]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2020-01-23 15:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?