|Syntactic Confectionery Delight|
Re: regex for identifying encrypted textby sundialsvc4 (Abbot)
|on May 16, 2018 at 15:21 UTC||Need Help??|
Very-fortunately, you don’t need to do this, and therefore ought not attempt it.
Cipher material always begins and ends with very distinctive strings such as:
Encrypted content will also always be similarly bracketed. You do not have to guess that it is “encrypted text.” The markers will tell you.
All of this is formally defined as RFC 5958 – Asymmetric Key Packages.
A regex such as /\-\-\-\-\-BEGIN ENCRYPTED PRIVATE KEY\-\-\-\-\-(.*?)\-\-\-\-\-END ENCRYPTED PRIVATE KEY\-\-\-\-\-/s will reliably match and capture the cipher material that may be found within a very large string. The only gotcha in this case is that the pattern must be non-greedy (as shown), so that in a string containing many such blocks it will stop at the next occurrence of the end-marker instead of consuming everything up to the very last occurrence of that marker. There is no need to look for the material itself, e.g. to somehow recognize Base64 encoding. If a begin-marker occurs, you can rely on the fact that an end-marker will always be present and that the two will correspond. Everything in-between the two markers will be nothing but cipher material of the specified kind, and you don’t have to consider how it might be encoded or structured.
(Depending on your exact regular-expression, you may need to be sure that you correctly consider what “newline” sequence is being used in the data.   See perldoc perlrebackslash.)
If you need to process as a single string stuff that might contain cipher blocks that you don’t want, just use a s// regex to change the entire thing, markers and all, to an empty-string. (Do it “globally” – s///g – if you want to get them all in one swell foop.)
If you might be processing a file line-by-line but want to exclude cipher blocks, a simple “fetch the next line of data” subroutine could be devised which will detect key-block markers and loop over (and discard) the markers and data, returning the line (if any) which follows them. You can use a simple eq here: the marker will always begin in column #1, will always be uppercase-only ASCII, and will end with a newline immediately following the last dash.
If you need instead to collect the cipher material into a string, you should collect both markers as well as the lines in-between them, with \n newline characters in-between each as well as at the end.