Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^4: Alternations and anchors (trie optimization)

by cavac (Prior)
on Apr 03, 2025 at 11:44 UTC ( [id://11164561]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Alternations and anchors (trie optimization)
in thread Alternations and anchors

Usage would show if it's of wider interest or just pleasing you and a handful of others.

Without knowing more about the OPs specific use case, i certainly don't know if this attempt at optimization is actually required or if we're dealing with an X/Y problem here.

For example, if the input dataset is refered often, but hardly ever changes, caching the parsed results might decrease wait time orders of magnitude more in the long run than trying to optimize the parsing time. If new or changed data is also not used immediately, throw in a bit of pre-computing, and we're basically down to the time it requires to read a file from disk (and even then there are ways to optimize it, for example by pre-caching in RAM).

PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Also check out my sisters artwork and my weekly webcomics
  • Comment on Re^4: Alternations and anchors (trie optimization)

Replies are listed 'Best First'.
Re^5: Alternations and anchors (trie optimization)
by LanX (Saint) on Apr 03, 2025 at 12:09 UTC
    You are essentially saying that improving the regex engine is not necessary because you can preprocess any input in a database.

    It's like saying improving Perl's speed is nonsense, you can always use C or Assembler instead.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      You are essentially saying that improving the regex engine is not necessary

      That's not what I read. More like changing the regex engine may not be necessary for the OP's real world problem (whatever that might be). If someone were to "improve" the regex engine to make this particular operation faster to run or easier to code, what would be the penalties for every other use of the engine? A workaround might well be the better option.

      Generally, if an improvement can be made which neither breaks backwards compatibilty nor slows down any aspect of RE use nor introduces a forwards maintenance problem then I say go for it. Any other scenario should remain up for debate and decided on the relative merits/demerits.


      🦛

        In that case he should have replied to the OP directly. Don't you think?

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

      No, not what i'm saying.

      What i'm saying, in case of question from the OP, is, that given their (assumed) limited time to come up with a workable solution, it is always helpful to know the full story. This way, we can see the bigger picture and maybe come up with a solution that gives an even better boost in performance.

      Optimizing the regex engine in general and the finding a better solution for the given code example is always a good thing. What i meant by my post is that regex matching might not have to be in the performance critical part of the overall system at all.

      To give you an example from my own software: I'm using IO::Compress::Gzip for HTTP compression ("Content-Encodig"). I have to use that for dynamic content on the fly, so getting the best performance is important. But by pre-compressing static assets during startup, i can reduce workload and wait time during page loads a lot.

      Another example: On some tables in my database, i can do searches over all columns, some of which have to be converted from one type to another or have some calculations done to them. No matter how you optimize your SQL statements, it's gonna be slow. By pre-calculating a single text-type search column on inserts, INSERT is gonna be a bit slower (in a background process), but i can use a full text search module on a single, indexed column (trading space for time, essentially) to speed up searches. It's still important to make the pre-calculation fast and efficient¹, but moving the main processing out of the time critical user interaction path reduces the time the user has to wait for results.²


      ¹ Processor cycles cost power, which costs money.

      ² Peoples time costs more money than a harddisk upgrade every couple of years that has to be done anyway.

        Context matters, this sub thread (and the title) is about Regexes.

        If you think the OP is asking an XY question you should reply to him directly, IMHO.

        Generally speaking: Yes, of course you can always replace time complexity with space complexity (preprocessing input data).

        But it doesn't add much information to a Regex discussion.

        I'm wondering who in this thread even bothered to understand what trie optimization does. It's not that complicated...

        Edit

        And it's actually very close to the way data is preprocessed.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11164561]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2025-06-19 12:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.