is it base64?

 

Introduction

There is a long time that I was trying to find out how to figure if a string is base64 encoded or not, in Shell Script.

Looking all over the Internet I see that this is a problem with no real solution on to many programming languages. After a lot of work and a lot of collaboration (@homembit, @ysidorito, @smailli, @kretcheu, @smailli, @pauloeferreira, @rgou, @natanschultz and probably someone else I forgot) I figure it out.

I’ll try to explain how.

The problem

There are two conditions that makes a string a base64 encoded string:

  1. Its length must be perfectly divisible by 4;

  2. It must contain only [A-Za-z0-9+/]+[=]{0,2} chars;

But respect only this two conditions don’t guarantee that this string is a base64 encoded string. To show that there are exceptions lets take a look on the strings below:

  • S1=”FarmaciaCido”

  • S2=”BoniattiCorbelia”

  • S3=”PrefCorbelia”

Each one of this strings fits those two conditions and they are not base64 encoded string. Well in a strictly sense they are, but the result of they decoded are non-ascii, so it doesn’t mean nothing. Thats why identify if a string is base64 encoded string or not is so hard.

The solution

The most common solution I have seen out there talks about run some decode function and expect by it’s return code. If its ok, then it must be a base64 encodes string, otherwise its not. But as we explained before there are many possible false positives, that make this a not reliable solution.

In the most of cases what we need is to know if the base64 decoded string is ascii, to get our regular string. So, the solution is test the base64 decode to figure if it contains non-ascii characters or not. If it does, then it’s not what we want.

The code

This is my is_base64() shell script function:

With comments:

# $BASE_STRING

is_base64 () {

# Get string length

STR_LENGTH=${#BASE_STRING}

# Test if it is perfectly divisible by 4

TEST1=`echo $STR_LENGTH % 4 | bc`

if [ “$TEST1” = “0” ] ; then

#Test if it have only the allowed chars

TEST2=`echo “$BASE_STRING” | egrep “^[A-Za-z0-9+/]+[=]{0,2}$”`

if [ “$TEST2” != “” ] ; then

# Test if the result is non-ascii free

TEST3=`echo “$BASE_STRING” | base64 -d | grep -v [:ascii:]`

# If it is, return true, otherwise, return false

if [ “$TEST3” = “” ] ; then

echo 0

else

echo 1

fi

else

echo 1

fi

else

echo 1

fi

}

Without comments:

# $BASE_STRING

is_base64 () {

STR_LENGTH=${#BASE_STRING}

TEST1=`echo $STR_LENGTH % 4 | bc`

if [ “$TEST1” = “0” ] ; then

TEST2=`echo “$BASE_STRING” | egrep “^[A-Za-z0-9+/]+[=]{0,2}$”`

if [ “$TEST2” != “” ] ; then

TEST3=`echo “$BASE_STRING” | base64 -d | grep -v [:ascii:]`

if [ “$TEST3” = “” ] ; then

echo 0

else

echo 1

fi

else

echo 1

fi

else

echo 1

fi

}