Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Escaping quotes in JSON string

by HeadScratcher (Novice)
on Oct 11, 2015 at 12:37 UTC ( [id://1144422]=perlquestion: print w/replies, xml ) Need Help??

HeadScratcher has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I have some JSON data in a file that I need to process how do I escape single and double quotes

the sample here demonstrates what I'm up against

{[ {'firstName':'John', 'lastName':'Doe', 'text' : 'Text with new lines and tabs' }, {'firstName':'Anna', 'lastName':'Smith', 'text' : 'text with ''2 single quotes togeether '' and "Double +quotes either in pairs" or in ""Double pairs"" ' }, {'firstName':'Peter', 'lastName':'O'Toole', 'text' : "More text with diacritics' ]}
JSON::PP_SUPPORT_METHODS
$json->allow_singlequote->decode({'foo':'bar'});

I think will fix the single quote problem in the 'key' : 'value' pair however how do I deal with quotes single & double within the text? I'm getting errors like ", or expected while parsing object/hash at charecter offest 1234 (before "'2 single quotes'' .....")

use strict; use warnings; use JSON -support_by_pp; my $content = qq( {[ {'firstNameA' : 'John', 'lastName' : 'Doe', 'text' : 'Text with new lines and tabs' }, {'firstNameB' : 'Anna', 'lastName' : 'Smith', 'text' : 'text with ''2 single quotes togeether '' and "Doubl +e quotes either in pairs" or in ""Double pairs"" ' }, {'firstNameC' : 'Peter', 'lastName' : 'O'Toole', 'text' : "More text with diacritics' ]} ); print $content; my $json = new JSON; my $json_text = $json->allow_nonref->utf8->relaxed->escape_slash-> +loose->allow_singlequote->decode($content);

BTW the sample data here is just for show the actual file can not be uploaded

Thank You

Replies are listed 'Best First'.
Re: Escaping quotes in JSON string
by RichardK (Parson) on Oct 11, 2015 at 13:58 UTC

    That's not valid json, if you check the spec it says

    A string is a sequence of zero or more Unicode characters, wrapped in +double quotes, using backslash escapes.

    allow_singlequote is a JSON::PP extension and just not worth using IMHO. The beauty of JSON is that it's simple & well defined, but if you use extensions then you'll break interoperability.

Re: Escaping quotes in JSON string
by AnomalousMonk (Archbishop) on Oct 11, 2015 at 18:17 UTC

    I've got a potato and I want to turn it into a tomato. This should be possible given that "tomato" looks so much like "potato". Please advise how to proceed.

    If you can correct this mess by hand, it may be possible to go through all the possible variations (in millions of records!) and develop some heuristics that will allow the definition of a set of regexes to be used in a bunch of substitutions, or maybe develop a set of parsing rules. The critical problem, and I would think the chief sink of time and effort in this Quixotic quest, will be developing a robust unit testing framework to allow you to prove you really can spin straw into gold.

    Your best bet: Whoever's sending you this junk, tell them you know where they live and they'd better start sending you valid data or else!

    Please forgive the snarky tone of this post. I just want to make the point that there are some jobs best left undone, and even un-begun!

    BTW: Does the final part of your example data

    {'firstNameC' : 'Peter', 'lastName' : 'O'Toole', 'text' : "More text with diacritics' ]}
    actually represent something you would see in your real-word data or is it just a cut/paste typo? If it's real, good luck!

    Update: WRT the unit testing framework: Remember that you must cover both false negative cases (records that need to be fixed and are missed), and false positive cases (records that get "fixed" even though they were just fine to begin with, thus screwing them up). Remember also that if your fixer-upper script is 99.9% effective, you will still have, out of millions of records, thousands that are missed and still need fixing — perhaps by hand? Also: please ponder the notions "tarbaby", "quagmire" and "death march".


    Give a man a fish:  <%-{-{-{-<

Re: Escaping quotes in JSON string
by ikegami (Patriarch) on Oct 12, 2015 at 01:04 UTC

    That's not even close to being JSON.

    • JSON string literals must be delimited using «"» (not «'»).
    • Line feeds can't be placed in JSON string literals. A suitable escape sequence («\n» or «\u000A») must be used instead.
    • All instances of «"» in JSON string literals must be escaped using «\».
Re: Escaping quotes in JSON string
by HeadScratcher (Novice) on Oct 11, 2015 at 17:14 UTC

    RichardK

    Unfortunately that is how I receive the data Doing a manual edit eg escaping each ' & " I can pass the data as JSON (This can be done on test data but not on a couple of million records) using the allow_singlequote. my problem is how do I escape the quotes inside 'value' prior to turning it into JSON object? </P

      You will have to write a parser for your not-JSON instead of trying to make JSON modules accept your not-JSON.

      I would look at the source of JSON::Tiny, which should work with the little change of escaping ' and ", by doubling them instead of putting a backslash in front of them.

Re: Escaping quotes in JSON string
by nikosv (Deacon) on Oct 12, 2015 at 10:40 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1144422]
Approved by AppleFritter
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-19 06:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found