Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re: Switching on internal UTF-8 flaq on DBI result from database

by Aristotle (Chancellor)
on Dec 14, 2003 at 17:19 UTC ( #314653=note: print w/replies, xml ) Need Help??

in reply to Switching on internal UTF-8 flaq on DBI result from database

require is a no-op if called multiple times on the same module, so you can just move it into the function. You can also shorten the code by collapsing the outer for into a map:
sub _utf8_on_all_arrayref { require Encode; Encode::_utf8_on($_) for map @$_, @{$_[0]}; $_[0]; }
sub _utf8_on_all_arrayref { require Encode; Encode::_utf8_on($$_) for map \(@$_), @{$_[0]}; $_[0]; }

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: Re: Switching on internal UTF-8 flaq on DBI result from database
by liz (Monsignor) on Dec 14, 2003 at 20:46 UTC
    Well, I think if you can avoid calling require over and over again, you should. Any require is at least a lookup in %INC each time you execute it.

    use Benchmark qw(:hireswallclock timethese); timethese( 1000000,{ one => sub { require Benchmark }, two => sub { require Benchmark; require Benchmark }, }); __END__ $ perl 1 Benchmark: timing 1000000 iterations of one, two... one: 2.43888 wallclock secs ( 1.70 usr + 0.00 sys = 1.70 CPU) @ 588 +235.29/s (n=1000000) two: 5.11982 wallclock secs ( 2.77 usr + 0.00 sys = 2.77 CPU) @ 361 +010.83/s (n=1000000)

    Also, for some reason your list flattening with map() does not work. Not sure whether this is a bug in Perl, or a conceptual problem with using map() and $_. Observe:

    use Encode qw(_utf8_on); $a = [['']]; foreach (@$a) { _utf8_on($_) foreach @$_ } # my way print utf8::is_utf8( $a->[0][0] ),$/; $a = [['']]; _utf8_on( $_ ) for map @$_, @{$a}; # Aristotle's way print utf8::is_utf8( $a->[0][0] ),$/; __END__ 1
    which should show two 1's instead of 1.

    But also from a performance point of view, the extra list flattening with map() is not very efficient:

    use Encode qw(_utf8_on); use Benchmark qw(:hireswallclock timethese); push @$a,[(0) x 10] foreach 1..10; timethese( 10000,{ liz => sub { foreach (@{$a}) { _utf8_on( $_ ) foreach @{$_}; } }, Aristotle => sub { _utf8_on( $_ ) for map @$_, @{$a}; }, }); __END__ Benchmark: timing 10000 iterations of Aristotle, liz... Aristotle: 4.37344 wallclock secs ( 3.73 usr + 0.00 sys = 3.73 CPU) @ + 2680.97/s (n=10000) liz: 4.06957 wallclock secs ( 2.73 usr + 0.00 sys = 2.73 CPU) @ 3663. +00/s (n=10000)


    The way Aristotle proposed doesn't work because map() creates a copy of the elements, on which the UTF-8 flag is set and then discarded. See $_ and list flattening with map() for more info.

      Doh. Of course, map aliases $_ during the expression, but the expression's result is completely separate. I updated my previous reply with a version that works. Returning a new value rather than just operating on an alias is the reason for the performance hit of course; unfortunately, I can only think of only two other remotely relevant aliasing constructs in Perl, neither of which are any help here: grep returns aliases, and the @_ in a function call contains aliases rather than copies. I can't see how to use either to achieve something less awkward than a nested for loop though.

      Makeshifts last the longest.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://314653]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2017-07-22 01:47 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (336 votes). Check out past polls.