Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re: regex for identifying encrypted text

by sundialsvc4 (Abbot)
on May 16, 2018 at 15:21 UTC ( #1214660=note: print w/replies, xml ) Need Help??

in reply to regex for identifying encrypted text

Very-fortunately, you don’t need to do this, and therefore ought not attempt it.   Cipher material always begins and ends with very distinctive strings such as:
which always occur on their own line with nothing else on that line, specifically to facilitate either parsing-out or excluding that material.   The next marker will be the corresponding end-marker, also by itself on its own line.

Encrypted content will also always be similarly bracketed.   You do not have to guess that it is “encrypted text.”   The markers will tell you.

All of this is formally defined as RFC 5958 Asymmetric Key Packages.

A regex such as /\-\-\-\-\-BEGIN ENCRYPTED PRIVATE KEY\-\-\-\-\-(.*?)\-\-\-\-\-END ENCRYPTED PRIVATE KEY\-\-\-\-\-/s will reliably match and capture the cipher material that may be found within a very large string.   The only gotcha in this case is that the pattern must be non-greedy (as shown), so that in a string containing many such blocks it will stop at the next occurrence of the end-marker instead of consuming everything up to the very last occurrence of that marker.   There is no need to look for the material itself, e.g. to somehow recognize Base64 encoding.   If a begin-marker occurs, you can rely on the fact that an end-marker will always be present and that the two will correspond.   Everything in-between the two markers will be nothing but cipher material of the specified kind, and you don’t have to consider how it might be encoded or structured.

(Depending on your exact regular-expression, you may need to be sure that you correctly consider what “newline” sequence is being used in the data.   See perldoc perlrebackslash.)

If you need to process as a single string stuff that might contain cipher blocks that you don’t want, just use a s// regex to change the entire thing, markers and all, to an empty-string.   (Do it “globally” s///g if you want to get them all in one swell foop.)

If you might be processing a file line-by-line but want to exclude cipher blocks, a simple “fetch the next line of data” subroutine could be devised which will detect key-block markers and loop over (and discard) the markers and data, returning the line (if any) which follows them.   You can use a simple eq here:   the marker will always begin in column #1, will always be uppercase-only ASCII, and will end with a newline immediately following the last dash.

If you need instead to collect the cipher material into a string, you should collect both markers as well as the lines in-between them, with \n newline characters in-between each as well as at the end.

  • Comment on Re: regex for identifying encrypted text

Replies are listed 'Best First'.
Re^2:regex for identifying encrypted text
by Eily (Prior) on May 16, 2018 at 16:25 UTC

    /\-\-\-\-\-BEGIN ENCRYPTED PRIVATE KEY\-\-\-\-\-(.*?)\-\-\-\-\-END ENCRYPTED PRIVATE KEY\-\-\-\-\-/ will reliably match and capture
    at least it will reliably compile. But . doesn't match \n unless you have the /s option for your regex. You were just one character away from code that actually works.

      “Well, foo!”   Edited the post.   Is that right now?   Or, if you prefer, post a correction.   Thanks.

      (Anyhow, mostly I was driving for the general idea:   that you can and should rely on the brackets and use them for this intended purpose.)

        That's better, thanks. We all told skendric to use the delimiters one way or another, with a useful addition from hippo stating that it can only be done if you try to do that before the delimiting lines are removed by the diff tool.

        brackets? what brackets? do you know what a bracket is?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1214660]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (8)
As of 2018-12-18 10:44 GMT
Find Nodes?
    Voting Booth?
    How many stories does it take before you've heard them all?

    Results (79 votes). Check out past polls.