Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Switching on internal UTF-8 flaq on DBI result from database

by Aristotle (Chancellor)
on Dec 14, 2003 at 17:19 UTC ( #314653=note: print w/ replies, xml ) Need Help??


in reply to Switching on internal UTF-8 flaq on DBI result from database

require is a no-op if called multiple times on the same module, so you can just move it into the function. You can also shorten the code by collapsing the outer for into a map:

sub _utf8_on_all_arrayref { require Encode; Encode::_utf8_on($_) for map @$_, @{$_[0]}; $_[0]; }
sub _utf8_on_all_arrayref { require Encode; Encode::_utf8_on($$_) for map \(@$_), @{$_[0]}; $_[0]; }

Makeshifts last the longest.


Comment on Re: Switching on internal UTF-8 flaq on DBI result from database
Select or Download Code
Re: Re: Switching on internal UTF-8 flaq on DBI result from database
by liz (Monsignor) on Dec 14, 2003 at 20:46 UTC
    Well, I think if you can avoid calling require over and over again, you should. Any require is at least a lookup in %INC each time you execute it.

    use Benchmark qw(:hireswallclock timethese); timethese( 1000000,{ one => sub { require Benchmark }, two => sub { require Benchmark; require Benchmark }, }); __END__ $ perl 1 Benchmark: timing 1000000 iterations of one, two... one: 2.43888 wallclock secs ( 1.70 usr + 0.00 sys = 1.70 CPU) @ 588 +235.29/s (n=1000000) two: 5.11982 wallclock secs ( 2.77 usr + 0.00 sys = 2.77 CPU) @ 361 +010.83/s (n=1000000)

    Also, for some reason your list flattening with map() does not work. Not sure whether this is a bug in Perl, or a conceptual problem with using map() and $_. Observe:

    use Encode qw(_utf8_on); $a = [['']]; foreach (@$a) { _utf8_on($_) foreach @$_ } # my way print utf8::is_utf8( $a->[0][0] ),$/; $a = [['']]; _utf8_on( $_ ) for map @$_, @{$a}; # Aristotle's way print utf8::is_utf8( $a->[0][0] ),$/; __END__ 1
    which should show two 1's instead of 1.

    But also from a performance point of view, the extra list flattening with map() is not very efficient:

    use Encode qw(_utf8_on); use Benchmark qw(:hireswallclock timethese); push @$a,[(0) x 10] foreach 1..10; timethese( 10000,{ liz => sub { foreach (@{$a}) { _utf8_on( $_ ) foreach @{$_}; } }, Aristotle => sub { _utf8_on( $_ ) for map @$_, @{$a}; }, }); __END__ Benchmark: timing 10000 iterations of Aristotle, liz... Aristotle: 4.37344 wallclock secs ( 3.73 usr + 0.00 sys = 3.73 CPU) @ + 2680.97/s (n=10000) liz: 4.06957 wallclock secs ( 2.73 usr + 0.00 sys = 2.73 CPU) @ 3663. +00/s (n=10000)

    Liz

    Update:
    The way Aristotle proposed doesn't work because map() creates a copy of the elements, on which the UTF-8 flag is set and then discarded. See $_ and list flattening with map() for more info.

      Doh. Of course, map aliases $_ during the expression, but the expression's result is completely separate. I updated my previous reply with a version that works. Returning a new value rather than just operating on an alias is the reason for the performance hit of course; unfortunately, I can only think of only two other remotely relevant aliasing constructs in Perl, neither of which are any help here: grep returns aliases, and the @_ in a function call contains aliases rather than copies. I can't see how to use either to achieve something less awkward than a nested for loop though.

      Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://314653]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (8)
As of 2015-07-07 02:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (86 votes), past polls