Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Unescaping JavaScript string

by a01 (Initiate)
on Mar 18, 2016 at 03:54 UTC ( #1158186=perlquestion: print w/replies, xml ) Need Help??

a01 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I need my Perl script to unescape a JavaScript string literal that is in a <script> tag in HTML code that I need to parse. For example:

<script type="text/javascript">window.open ('https\x3a\x2f\x2fexample. +com')</script>

If I capture the string with a regular expression, I get the string:

https\x3a\x2f\x2fexample.com

I need to decode this to:

https://example.com

How do I do this in Perl?

Replies are listed 'Best First'.
Re: Unescaping JavaScript string (encode/decode)
by Anonymous Monk on Mar 18, 2016 at 04:32 UTC

    use JavaScript::HashRef::Decode

    #!/usr/bin/perl -- use strict; use warnings; use JavaScript::HashRef::Decode qw/ decode_js /; my $string = q{'https\x3a\x2f\x2fexample.com'}; my $js_hash = "{it:$string}";; my $fromJs = decode_js( $js_hash ) ; dd( $string, $js_hash, $fromJs ); dd( $fromJs->{it} ); __END__ ( "'https\\x3a\\x2f\\x2fexample.com'", "{it:'https\\x3a\\x2f\\x2fexample.com'}", { it => "https://example.com" }, ) "https://example.com"
Re: Unescaping JavaScript string
by 1nickt (Abbot) on Mar 18, 2016 at 04:25 UTC

    Hi a01, welcome to the monastery.

    Your string was encoded to UTF-8, so you need to decode it using the core module Encode.

    $ perl -MEncode -E' say Encode::decode_utf8("https\x3a\x2f\x2fexample.com"); ' https://example.com
    You may want to refer to perlunitut for an introduction to Unicode processing in Perl.

    Hope this helps!


    The way forward always starts with a minimal test.
      "https\x3a\x2f\x2fexample.com"
      That code is useless to the OP. You replaced single quotes (the original Javascript) with double quotes, thereby doing the decoding in the Perl source code. It won't work if the data comes from anywhere else than from the source of the Perl script itself.

      It's also important, I feel, to note that this is not necessarily UTF-8.

      Latin-1 (ISO 8859-1) \x3a = 58 = : = chr(58) = "\x{3a}" \x2f = 47 = / = chr(47) = "\x{2f}" ASCII: ditto UTF-8: same same Windows-1252: also, also

      Unless it's been specifically decoded to UTF-8 (or one of the others) from a known source encoding, or it appeared in a use utf8 script literally, it is a risky assumption.

      javascript has more ways to encode strings that Encode::decode_utf8 doesn't handle

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1158186]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (1)
As of 2021-11-30 03:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?