Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re^2: Unicode and Regexps: convert or am I missing something?

by newrisedesigns (Curate)
on Jun 02, 2005 at 01:02 UTC ( #462711=note: print w/replies, xml ) Need Help??

in reply to Re: Unicode and Regexps: convert or am I missing something?
in thread Unicode and Regexps: convert or am I missing something?

Thanks for your reply.

According to the header, the returned data is UTF-16LE, which I assume stands for little-endian. I am on a Mac, so I guess I'm big-endian, which would explain why I was getting Asian glyphs instead of my Adsense results.

I tried the Encode method you suggested, also with the variation 'LE' after the 16 (why not, I've tried everything else, it seems) but it didn't work. The \p{Digit} does match, but fails when used in conjunction with the date field separator (/) like so: \p{Digit}\\/.

I guess the problem comes down to endian-ness of the data returned. How do I flip flop the data so that the methods available to me (Encode:: and /usr/bin/iconv) will work for me?

Replies are listed 'Best First'.
Re^3: Unicode and Regexps: convert or am I missing something?
by thundergnat (Deacon) on Jun 02, 2005 at 01:29 UTC

    UTF-16LE is supported by the Encoding module, so it should work... Did you try down converting it to Latin-1? The less often used encodings don't have as many aliases, you may need to be more careful about how the encoding is specified.

    Encode::from_to($string, 'UTF-16LE', 'utf8');

    should be ok, as should

    Encode::from_to($string, 'UTF-16LE', 'iso-8859-1');

    You only need to single escape the forward slash in the regex. (Or use alternate delimiters.)

    my $string = '5/18/05 184 7 3.8% 6.14 1.13'; if ($string =~ m#(\p{Digit}+/\p{Digit}+/\p{Digit}+)#){ print $1; }
Re^3: Unicode and Regexps: convert or am I missing something?
by dakkar (Hermit) on Jun 02, 2005 at 10:13 UTC
    use Encode; my $string=Encode::decode('UTF-16LE',$data_from_google); $string=~/what you want/;

    from_to is the wrong function to use. It converts between byte strings, but to correctly work with regexp you need character strings, so you need to use decode

            dakkar - Mobilis in mobile

    Most of my code is tested...

    Perl is strongly typed, it just has very few types (Dan)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://462711]
[Discipulus]: yes Eily, thanks oiskuu but i dont get it.. ;=( maybe I'll ask a SOPW
[LanX]: all combinations with same amount of left and right?
[Eily]: if you want to store in a structure with the coordinates as key, arrays might do, since the keys are going to be 0..n
[LanX]: (Pascale path)
[Eily]: paths like that
[Discipulus]: yes Eily++ (very keen) I want to integrate my project with a 17th experiments. I want to colorize in sequence all paths
[oiskuu]: Yeah, modifry the recursive func combinations() to return not the number, but the paths themselves.
[Eily]: Discipulus I'd do that by starting from the bottom node I think. That way it can inherit the paths from the two nodes above (and so on, recursively)
[LanX]: oh I meant fixed amount
[LanX]: every path must have l left and r right edges and l and r are fixed and l+r is the height

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (10)
As of 2018-03-19 11:19 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (239 votes). Check out past polls.