P is for Practical | |
PerlMonks |
string? Or binary garbage?by argv (Pilgrim) |
on Dec 01, 2004 at 00:21 UTC ( [id://411335]=perlquestion: print w/replies, xml ) | Need Help?? |
argv has asked for the wisdom of the Perl Monks concerning the following question:
I'm using Josh Carter's IPTCInfo package on cpan that reads the IPTC header from an image (e.g., jpg) and fills in fields such as "author", "location", etc. The problem is, some data may be corrupted, or perhaps unintelligible because it's written in a different language. Is there a way to know which?
At first, it seemed simple to just check for normal ascii, but then it occurs to me that I want to accept certain accented characters, like the é in café, and so on... Before I go off writing some routine that checks for santiy in a string to see if it really is english text instead of arbitrary gobblygook, I figured maybe someone had such a thing. Even if I only look at the first N characters in a string, that'd be fine. Again, the brute force intuitive step would be to just do something like
but this seems like a hornet's nest of little gotchas where people have learned it ain't that simple.
Back to
Seekers of Perl Wisdom
|
|