Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &

by psynk (Initiate)
on Mar 06, 2013 at 20:19 UTC ( #1022086=perlquestion: print w/ replies, xml ) Need Help??
psynk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm using

...
use Spreadsheet::XLSX;
use Spreadsheet::XLSX::Fmt2007;
use Spreadsheet::XLSX::Utility2007;
...

to parse an xlsx spreadsheet where a cell contains the value:

<Message Name> (<Internal Name>)

When I look at the value returned, it looks like this (minus the spaces so it doesn't render):

& lt;Message Name& gt; (& lt;Internal Name& gt;)

This does not happen when the spreadsheet is an xls (2003 format). I tried using Notepad and Textpad to see what the xlsx stores, but it's all gibberish. I'm using the following to return the value of the cell

...
$cell = $sheet -> {Cells} $row $col;
...
$source_val3_col_val = $cell -> {Val};

Any thoughts or suggestions would be appreciated. I've looked through most of the libraries I'm using and haven't come across the code that converts the strings, so my first thought is that xlsx is storing the &xx; values.

Comment on Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &
Re: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &
by SuicideJunkie (Priest) on Mar 06, 2013 at 21:26 UTC

    How are you looking at the value?

    What does substr($value, 0,1) look like when you display it using the same method? '&' or '&lt;'?

      I see '&' which leads me to believe that's how MSFT is storing it in the xlsx spreadsheets.
Re: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &
by runrig (Abbot) on Mar 06, 2013 at 21:30 UTC
    Try $cell->value() which should be the formatted value of the cell. $cell->{Val} is equivalent to $cell->unformatted(). I'm not sure there will be a difference here...but worth a try.
      Thanks for the suggestion. There is no difference between the two. Both return &gt; which leads me to believe that's how MSFT is storing it in the xlsx spreadsheets.
Re: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &
by jellisii2 (Friar) on Mar 06, 2013 at 21:33 UTC
      Thanks for the references. I agree, xlsx format is storing the XML data.

      I'm a bit of a newbie to perl, so rather than try to figure out how to use XLM::Twig, I'm just going to do a couple regexp substitution strings and call it a day. Thanks you all for the quick replies.
Re: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &
by afoken (Parson) on Mar 07, 2013 at 19:10 UTC
      ...rewrite Spreadsheet::XLSX using a sane XML parser, like XML::LibXML.

      Work in progress.

        Likely beyond my Perl skill level at the moment, but I'll keep it in mind. Meanwhile, I wrote a small function to fix the data. I also found code strings in the original data that we decided needed fixin', so this function might be all I need at the moment.

        sub FixXML { $parm = $_[0]; $parm =~ s/&amp;/&/g; $parm =~ s/&gt;/>/g; $parm =~ s/&lt;/</g; $parm =~ s/&quot;/"/g; $parm =~ s/&apos;/'/g; $parm =~ s/&#xA;/\n/g; $parm =~ s/&#xa;/\n/g; $parm =~ s/&#xD;/\r/g; $parm =~ s/&#xd;/\r/g; $parm =~ s/&#x9;/\t/g; return($parm); }

      Part of some $work->project, I wrote PROCURA::XML::Entities. I'm willing to remove the PROCURA:: part and put it on CPAN. (Feel free to nick the code if you prefer, but in that case retain the credentials).

      SYNOPSIS use PROCURA::XML::Entities; my $a = "Read &quot;perlre&quot; for explanation of &apos;&amp +;&apos;"; my $b = decode_entities ($a); # $b will now be q{Read "perlre" for explanation of '&'} $c = encode_entities ($b); # $c should be the same as $a use PROCURA::XML::Entities (); $decoded = PROCURA::XML::Entities::decode ($a); $encoded = PROCURA::XML::Entities::encode ($a);

      Enjoy, Have FUN! H.Merijn
Re: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &
by bart (Canon) on Mar 08, 2013 at 22:52 UTC
    XLSX is actually XML compressed with ZIP. It looks to me like the developers of this module forgot to unescape the XML text...

    edit It looks like my hunch was correct.

Re: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &
by ig (Vicar) on Dec 17, 2013 at 09:52 UTC

    I was seeing the same thing with Spreadsheet::XLSX. For other reasons (built-in number/date formats not handled) I switched to Spreadsheet::ParseXLSX. With the latter, cells with built-in 'custom' number formats have correct formatted values and the entities are processed back to characters.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1022086]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2014-09-18 04:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (108 votes), past polls