Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^5: how to unicode filenames?

by perl-diddler (Hermit)
on Jun 28, 2012 at 20:18 UTC ( #978991=note: print w/ replies, xml ) Need Help??


in reply to Re^4: how to unicode filenames?
in thread how to unicode filenames?

Of course it does... ALL of the linux core utils know how -- Perl is just braindead by choice of its creators. cat, chmod, chown, chroot cp, cut, dirname, (all of the file name routines work with UTF-8); tac, wc uniq, sort, sed, awk, grep, Perl's "correctness" used to be measured by weather or not it produced the same output as the core utilities that it was based on. Perl derived from those core utils -- and their behavior set the standard for how perl ran. Perl Fails randomly and often on compatibility with the utils that it was designed to be a combination of. Simple word count program:

#!/usr/bin/perl -w ## 'pwc' use 5.14.0; my ($l,$w,$c)=(0,0,0); while (<>) { ++$l; $c += length $_; while ( m{^\W*(\w+)(.*)$} ) { ++$w; $_=$2; } } printf "%d\t%d\t%d\n", $l, $w, $c; a text file: > file /tmp/txt /tmp/txt: UTF-8 Unicode text > wc -lwm /tmp/txt 3 5 38 /tmp/txt wc -lwm /tmp/txt > pwc /tmp/txt 3 24 64
--- (There are 5 words in /tmp/txt, but I can't post it here, as the 'bb-software for perlmonks, like perl isn't UTF-8 safe/compatible).
It gets closer with an autosplit version:
(from http://www.catonmat.net/download/perl1line.txt)

# Find the total number of fields (words) on all lines

> perl -alne '$t += @F; END { print $t}' /tmp/txt 4
(it only was off by 1)...

I could spend weeks detailing all the broken semantics, but it would be a waste of my time...just have to learn all the bugs in perl so you can work around them (as stated in a previous post -- when people told me labeling dysfunctional behavior was the sign of a bad craftsman (i.e. they blame their tools)... which is a meaningless statement considering it is also said that a good craftsman knows their tools (which means 'characterizing it's behavior')....

So the idea that it is "too hard" for perl to know how to correctly interpret text data is patently and easily, provably false as millions of other programs get it right. Perl's algorithms in this area are governed by ideologues who have beliefs about how the world should be run and enforce them on everyone else. There are multiple examples where they reduce choice -- take away choices from the users because the users are presumed to be too stupid to make their own decisions (yet these same people will complain when MS does similar).

Perl could be alot more intelligent in alot of areas, than it is -- in some cases it would involve, not implementing code, but ***removing*** code that was added to deliberately limit perl's functionality or to cause erroneous behavior.

But one can spend all their time pointing out the numerous flaws of the language, or attempt to work around them and get work done. The two are not completely, but to some extent are mutually exclusive as they draw on the same resource: time.

Until those in charge allow change, it won't happen. And it is a matter of allow -- since one change that was asked for came down to .. well no one who is capable of making the change wants it enough to do it". The proponent of the idea asked "if someone who was capable of making the change, submitted a patch, does that imply there would be no problem adding it into the source base?

The conversation was terminated at that point as the question was not answerable with a simple yes/no.


Comment on Re^5: how to unicode filenames?
Select or Download Code
Re^6: how to unicode filenames?
by Anonymous Monk on Jun 29, 2012 at 03:08 UTC

    Perl's algorithms in this area are governed by ideologues ...... The conversation was terminated at that point as the question was not answerable with a simple yes/no.

    Link?

      To see evidence of ideological behavior, you can google for the perl unicode release bug and see alot of people upset by it, but I thought this article was especially good: The unstoppable Perl Release train...

      Showing ideological behavior not just in relation to perl's unicode problems, but also release scheduling...

      There are MANY examples of ideological (irrational) behavior -- I've found the most logical group seems to be those who work with the parser -- probably, statistically, due to the type of brain that can deal with parsers...;-)

        I thought this comment on your especially good article was particularly good–

        This article isn't very good, and I'm disappointed at LWN for publishing it. –autarch

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://978991]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2014-09-22 01:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (176 votes), past polls