Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^3: Analysing a (binary) string.

by hdb (Monsignor)
on Jul 03, 2013 at 08:57 UTC ( [id://1042183]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Analysing a (binary) string.
in thread Analysing a (binary) string. (Solved)

Even though you declared this problem as solved, I have been thinking about the skip ahead technique. It could be used if

  1. you know a minimum length of the pattern, say at least 100,
  2. and you know that the number of errors is small compared to the length of the pattern, say at most 5.
You could then try finding a repeat of the first 100 chars somewhere later in the string using some tolerant matching technique, say at position 5000, and then lengthen the pattern to the first 5000 characters. This will not be as efficient as the original skip ahead but might speed up things somewhat. Not sure how it compares to the frequency analysis.

Replies are listed 'Best First'.
Re^4: Analysing a (binary) string.
by BrowserUk (Patriarch) on Jul 03, 2013 at 09:17 UTC
    Not sure how it compares to the frequency analysis.

    In the generic, that might be worth pursuing though my (fairly extensive) experience of fuzzy matching techniques is that they are always slow.

    For the specific case of my data, the frequency analysis proved so simple and efficient that it wasn't even worth timing it. My first attempt at the code worked first time and found all the reps that were there to be found in just a few seconds. There was no reason (for me) to pursue this further.

    If my dataset had not proven to be so amenable to the frequency analysis method -- near perfect inverse log distribution -- I might still be looking for another method, but I have plenty of other nuts to crack with this particular dataset :)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1042183]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-18 16:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found