Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^3: Does String::LCSS work?

by repellent (Priest)
on Jan 27, 2010 at 20:00 UTC ( [id://820013]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Does String::LCSS work?
in thread Does String::LCSS work?

OK, I'll take your word that the performance isn't that great for practical use. It looks like the speed was as advertised though (100 strings of 1000 chars takes about 4 minutes -- in the order of minutes).

Just glancing at the code, it seems that it doesn't suffer from the limitations of String::LCSS_XS pointed out by ikegami?

Replies are listed 'Best First'.
Re^4: Does String::LCSS work?
by BrowserUk (Patriarch) on Jan 28, 2010 at 01:59 UTC

    I guess the hardware has moved on a bit in the last 5 years. I also used a file of 100 1000-char (byte) strings.

    For my purposes, unicode isn't a concern. Maybe 10 years from now, once we stop penny-pinching over memory with variable length character encodings, and start using straight 32-bit characters universally, it'll be possible to write efficient text-munging code again. Till then, I'll stick with ASCII/iso-whatever unless I'm forced to deal with it.

    IMO. The guys that came up with the variable-length encoding should be tried for treason to humanity :)


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      IMO. The guys that came up with the variable-length encoding should be tried for treason to humanity :)

      I agree with the proposed action, but I don't think the problem is the var length so much as the incredible amount of "appears to work" going on as a result of pretending to be backwards compatible. Things would work much better if they failed more noticeably when done wrong.

      For example, UTF-16 is a variable width encoding, yet when someone fails to handle it, they don't move on until it's fixed. ("How do I get rid of the space...")

        yet when someone fails to handle it, they don't move on until it's fixed.

        I don't think I understand that statement? Do you mean: When programmers encounter a problem with handling utf-16, they neither try to correct it, nor fail loudly?

        From my perspective, the first problem is that when you read a file, and there is simply no way to know what it contains. Unless you know upfront what is in the file, there is no sensible mechanism for deciding how you should decode (or encode?) the contents of that file. Guess wrong and you produce nonsense. Know wrong and you produce nonsense. Download text from the web and if the webmaster has--through laziness, incompetence or maliciousness--miscoded the mime-type and you're stuffed.

        The second problem is, as I said, variable length encoding. With a fixed length encoding, if you want to fetch the 3 millionth character of your data, you add 3e6 to your base and fetch it. With variable length, you have to inspect every single one of the intervening 6 million to 12 million bytes.

        Imagine if the postman had to walk the length of Yonge Street inspecting each house name in order to deliver a letter :) (Or IP traffic if...)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://820013]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2024-04-19 16:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found