Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Fatal code point 0xFFFFFFFFFFFFFFFF

by Corion (Pope)
on Sep 05, 2018 at 12:29 UTC ( #1221754=note: print w/replies, xml ) Need Help??


in reply to Fatal code point 0xFFFFFFFFFFFFFFFF

If in doubt, use diagnostics and/or see perldiag for your error message(s):

Operation "%s" returns its argument for non-Unicode code point 0x%X

(S non_unicode) You performed an operation requiring Unicode rules on a code point that is not in Unicode, so what it should do is not defined. Perl has chosen to have it do nothing, and warn you.

To me, this means that the data you are reading is not really valid UTF-16 or valid Unicode. Please show us the relevant code that reads and decodes the data, and the relevant snippet of the data. That way, maybe we can see better where the problem originates from and make better suggestions as how to address this problem.

  • Comment on Re: Fatal code point 0xFFFFFFFFFFFFFFFF

Replies are listed 'Best First'.
Re^2: Fatal code point 0xFFFFFFFFFFFFFFFF
by Anonymous Monk on Sep 05, 2018 at 12:59 UTC
    I posted the wrong pattern. I'm matching about 1000 files and get this sequence of errors 8 times. It looks like something, but what? I can't tell if it means 8 files have 1 error or 1 file has 8 errors. I suspect it may be 8 files, and might involve the Japanese language:
    
    UTF-16 surrogate U+DFA8
    non-Unicode code point 0x1C9140
    non-Unicode code point 0xE6BAAA
    code point 0xFFFFFFFFFFFFFFFF
    code point 0xFFFFFFFFFFFFFFFF
    non-Unicode code point 0xFFFFFFFFFFFFFFFF
    non-Unicode code point 0x18B0E4
    non-Unicode code point 0x18B4DC
    non-Unicode code point 0x18B0E4
    
    
    Thank you for suggesting diagnostics. It shows something slightly different: a \t in front of every code point, like \tU+DFA8. I tried s/\t/ /gs before the regex but it has no effect.

    Sorry I can't really post the code or data because it's too complicated :-/

      Without either code or data, it's really hard for us to reproduce your problem or to suggest what might be the (root) cause, other than data that decodes to invalid Unicode sequences. My random guess is that you are either fiddling with the UTF-8 flag on strings or are creating Unicode strings in another invalid way, but that's hard to tell without code or data.

      My suggestion to you is to reduce your input data to find the line(s) which are causing the warnings to be thrown. In a second step, reduce the code of your program until nothing else remains except a short sequence of statements that are causing the warnings to be thrown.

      If by then, the solution is not obvious to you, show us both, the data and the short program. Maybe then we can help you better.

      Sorry I can't really post the code or data because it's too complicated

      Please see Short, Self-Contained, Correct Example. It should be possible for you to compose an SSCCE because you must have some idea of the parts of the text that are causing problems. Just the exercise of composing an SSCCE may give you insight into the root cause of the problem.


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1221754]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2019-06-17 05:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (76 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!