Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Unescaping JavaScript string

by a01 (Initiate)
on Mar 18, 2016 at 03:54 UTC ( #1158186=perlquestion: print w/replies, xml ) Need Help??

a01 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I need my Perl script to unescape a JavaScript string literal that is in a <script> tag in HTML code that I need to parse. For example:

<script type="text/javascript"> ('https\x3a\x2f\x2fexample. +com')</script>

If I capture the string with a regular expression, I get the string:


I need to decode this to:

How do I do this in Perl?

Replies are listed 'Best First'.
Re: Unescaping JavaScript string (encode/decode)
by Anonymous Monk on Mar 18, 2016 at 04:32 UTC

    use JavaScript::HashRef::Decode

    #!/usr/bin/perl -- use strict; use warnings; use JavaScript::HashRef::Decode qw/ decode_js /; my $string = q{'https\x3a\x2f\'}; my $js_hash = "{it:$string}";; my $fromJs = decode_js( $js_hash ) ; dd( $string, $js_hash, $fromJs ); dd( $fromJs->{it} ); __END__ ( "'https\\x3a\\x2f\\'", "{it:'https\\x3a\\x2f\\'}", { it => "" }, ) ""
Re: Unescaping JavaScript string
by 1nickt (Canon) on Mar 18, 2016 at 04:25 UTC

    Hi a01, welcome to the monastery.

    Your string was encoded to UTF-8, so you need to decode it using the core module Encode.

    $ perl -MEncode -E' say Encode::decode_utf8("https\x3a\x2f\"); '
    You may want to refer to perlunitut for an introduction to Unicode processing in Perl.

    Hope this helps!

    The way forward always starts with a minimal test.
      That code is useless to the OP. You replaced single quotes (the original Javascript) with double quotes, thereby doing the decoding in the Perl source code. It won't work if the data comes from anywhere else than from the source of the Perl script itself.

      It's also important, I feel, to note that this is not necessarily UTF-8.

      Latin-1 (ISO 8859-1) \x3a = 58 = : = chr(58) = "\x{3a}" \x2f = 47 = / = chr(47) = "\x{2f}" ASCII: ditto UTF-8: same same Windows-1252: also, also

      Unless it's been specifically decoded to UTF-8 (or one of the others) from a known source encoding, or it appeared in a use utf8 script literally, it is a risky assumption.

      javascript has more ways to encode strings that Encode::decode_utf8 doesn't handle

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1158186]
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (2)
As of 2023-05-28 09:40 GMT
Find Nodes?
    Voting Booth?

    No recent polls found