Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Escaping quotes in JSON string

by AnomalousMonk (Archbishop)
on Oct 11, 2015 at 18:17 UTC ( [id://1144440]=note: print w/replies, xml ) Need Help??


in reply to Escaping quotes in JSON string

I've got a potato and I want to turn it into a tomato. This should be possible given that "tomato" looks so much like "potato". Please advise how to proceed.

If you can correct this mess by hand, it may be possible to go through all the possible variations (in millions of records!) and develop some heuristics that will allow the definition of a set of regexes to be used in a bunch of substitutions, or maybe develop a set of parsing rules. The critical problem, and I would think the chief sink of time and effort in this Quixotic quest, will be developing a robust unit testing framework to allow you to prove you really can spin straw into gold.

Your best bet: Whoever's sending you this junk, tell them you know where they live and they'd better start sending you valid data or else!

Please forgive the snarky tone of this post. I just want to make the point that there are some jobs best left undone, and even un-begun!

BTW: Does the final part of your example data

{'firstNameC' : 'Peter', 'lastName' : 'O'Toole', 'text' : "More text with diacritics' ]}
actually represent something you would see in your real-word data or is it just a cut/paste typo? If it's real, good luck!

Update: WRT the unit testing framework: Remember that you must cover both false negative cases (records that need to be fixed and are missed), and false positive cases (records that get "fixed" even though they were just fine to begin with, thus screwing them up). Remember also that if your fixer-upper script is 99.9% effective, you will still have, out of millions of records, thousands that are missed and still need fixing — perhaps by hand? Also: please ponder the notions "tarbaby", "quagmire" and "death march".


Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1144440]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (8)
As of 2024-04-24 12:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found