Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Reliable way to detect base64 encoded strings

by Rodster001 (Pilgrim)
on Jun 29, 2009 at 21:09 UTC ( [id://775820]=perlquestion: print w/replies, xml ) Need Help??

Rodster001 has asked for the wisdom of the Perl Monks concerning the following question:

Title pretty much says it all. I have a string of text which occasionally is encoded base64. The way I am currently detecting if the string is encoded or not seems a bit of a hack and I was wondering if there was a more reliable (proper) approach. Right now, I am checking if it is a very "long" continuous string of characters and the number of characters in the string is a multiple of 4. If so, then I send it though MIME:Base64's decode_base64(). Any suggestions?

Thanks!

  • Comment on Reliable way to detect base64 encoded strings

Replies are listed 'Best First'.
Re: Reliable way to detect base64 encoded strings
by ikegami (Patriarch) on Jun 29, 2009 at 21:35 UTC
    Once whitespace is removed from the input, the following regex pattern will tell you whether the input is a valid base64-encoded string.
    m{ ^ (?: [A-Za-z0-9+/]{4} )* (?: [A-Za-z0-9+/]{2} [AEIMQUYcgkosw048] = | [A-Za-z0-9+/] [AQgw] == )? \z }x

    As to whether the input was the result of bas64-encoding or not, one can't tell. This sentence would be found valid up to the final period.

    $ perl -MMIME::Base64 -le'print decode_base64("This sentence would be +found valid up to the final period")' | od -t x1 0000000 4e 18 ac b1 e9 ed 7a 77 1e c2 8b a5 75 b7 9f a2 0000020 e9 dd bd a9 62 76 ea 6d a2 d8 5e 7e 29 da 96 97 0000040 ab 8a 87 0a 0000044

    Usually, one refers to the header associated with the content.

    Update: Replaced /(?:|...|...)/ with /(?:...|...)?/
    Update: Changed // to m{} to fix an unescaped / (as brought up in a reply).

      This works nicely. I don't have a header to work with so this good. There are a few things I don't quite understand in that regex though, would you mind commenting each line so I can get my head around it? Thanks a lot!
        It's actually really straightforward.
        • Start of input
        • Followed by any number of groups of 4 characters from [A-Za-z0-9+/],
        • Followed by one of the following:
          • [always matches]
          • Four characters where
            • The first and second match /[A-Za-z0-9+/]/
            • The third matches /[AEIMQUYcgkosw048]/
            • The fourth is a "="
          • Four characters where
            • The first matches /[A-Za-z0-9+/]/
            • The second matches /[AQgw]/
            • The third and fourth are both a "="
        • Followed by the end of input

        It's probably a bit simpler after the update I just did for you:

        • Start of input
        • Followed by any number of groups of 4 characters from [A-Za-z0-9+/],
        • Followed by zero or one of the following:
          • Four characters where
            • The first and second match /[A-Za-z0-9+/]/
            • The third matches /[AEIMQUYcgkosw048]/
            • The fourth is a "="
          • Four characters where
            • The first matches /[A-Za-z0-9+/]/
            • The second matches /[AQgw]/
            • The third and fourth are both a "="
        • Followed by the end of input
Re: Reliable way to detect base64 encoded strings
by Anonymous Monk on Jun 29, 2009 at 21:21 UTC
    What is reliable? :)

    base64 regex

    new Regex(@"[0-9a-zA-Z\+/=]{20,}"); '(^Content-Transfer-Encoding: base64$)(^$)(^(([A-Za-z0-9+/=]){4}){1,19 +}$)*(^$)' ^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$ my $sanitized_str = join q{}, grep {!/[^A-Za-z0-9+\/=]/} split /\n/, $ +str;
Re: Reliable way to detect base64 encoded strings
by bcarroll (Pilgrim) on Jun 17, 2014 at 01:21 UTC
    I have not done very extensive testing, but this seems to determine whether a string is Base64 encoded or not.

    use warnings; use strict; use MIME::Base64; my $base64text = encode_base64('This is some Base64 encoded text'); my $notbase64text = 'This is text is not Base64 encoded'; print is_base64($base64text) || "\nNot Base64\n"; print is_base64($notbase64text) || "\nNot Base64\n"; sub is_base64{ #Returns the decoded data on success, undef on failure my $data = shift; return(undef) unless ($data =~ /^[A-Za-z0-9+\/=]+$/); #test for vali +d Base64 string if (length ($data)%4==0){ print "Valid Base64\n"; my $decoded = decode_base64($data); return( $decoded ); } else { print "Invalid Base64\n"; return(undef); } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://775820]
Approved by zwon
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-24 17:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found