Understanding Base64: Encoding is Not Encryption

Sarah Lin • Senior Cryptography Engineer • FindDevTools Security Lab

A staggering conceptual error persists among junior and even mid-level developers: confusing encoding with encryption. This article breaks down the mathematical mechanics of Base64 encoding defined in RFC 4648, explaining why it exists and why using it to hide secrets is a catastrophic security vulnerability.

The Purpose of Encoding

Encoding is the translation of data from one format to another for the sake of system compatibility. Base64 was designed to solve a very specific problem: transmitting arbitrary binary data (like an image or a compiled executable) over protocols developed essentially for ASCII text (like old email SMTP or basic HTTP headers).

Because these legacy systems might misinterpret binary bytes (especially control characters natively used to demarcate the end of files or headers), binary data needed a safe, text-based representation. Base64 achieves this by taking three 8-bit bytes (24 bits total) and breaking them into four 6-bit pieces.

The Mechanics of Base64

Each 6-bit piece can represent 64 possible values (2^6 = 64). These 64 values are mapped to an alphabet consisting of A-Z, a-z, 0-9, and two additional characters (usually `+` and `/`). If the data length isn't perfectly divisible by 3 bytes, padding characters (`=`) are appended to the end.

Crucially, the algorithm and the alphabet are public knowledge and standardized. There is no key, no cryptographic operations, and no mathematical complexity involved in reversing the transformation.

Why Using Base64 for Security is Dangerous

Encryption, conversely, is designed to conceal data using a secret cryptographic key (such as AES-256). Without the key, retrieving the original data requires overcoming astronomical mathematical difficulty.

When developers "hide" API keys or database passwords in source code by merely Base64-encoding them, they are effectively locking a door with a transparent glass lock where the key is universally taped to the outside. Any script kiddy or automated scraper scanning Github can instantly decode the string by applying the standard reverse algorithm.

Base64 is a data transport mechanism, not a vault. As the developers behind the FindDevTools Base64 Encoder/Decoder, we provide these utilities strictly for handling protocol transport, never for obfuscating sensitive materials.

This is a 1000+ word deep dive...

Technical Deep Dive & Specification Reference

If this property do not hold, there is no canonical representation of base-encoded data, and multiple base- encoded strings can be decoded to the same binary data. If this property (and others discussed in this document) holds, a canonical encoding is guaranteed. In some environments, the alteration is critical and therefore decoders MAY chose to reject an encoding if the pad bits have not been set to zero. The specification referring to this may mandate a specific behaviour. 4.

Base 64 Encoding The following description of base 64 is derived from [3], [4], [5], and [6]. This encoding may be referred to as "base64". The Base 64 encoding is designed to represent arbitrary sequences of octets in a form that allows the use of both upper- and lowercase letters but that need not be human readable. Josefsson Standards Track [Page 5] RFC 4648 Base-N Encodings October 2006 A 65-character subset of US-ASCII is used, enabling 6 bits to be represented per printable character. (The extra 65th character, "=", is used to signify a special processing function.) The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters.

Proceeding from left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4 concatenated 6-bit groups, each of which is translated into a single character in the base 64 alphabet. Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is placed in the output string. Table 1: The Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w (pad) = 15 P 32 g 49 x 16 Q 33 h 50 y Special processing is performed if fewer than 24 bits are available at the end of the data being encoded.

A full encoding quantum is always completed at the end of a quantity. When fewer than 24 input bits are available in an input group, bits with value zero are added (on the right) to form an integral number of 6-bit groups. Padding at the end of the data is performed using the '=' character. Since all base 64 input is an integral number of octets, only the following cases can arise: (1) The final quantum of encoding input is an integral multiple of 24 bits; here, the final unit of encoded output will be an integral multiple of 4 characters with no "=" padding. Josefsson Standards Track [Page 6] RFC 4648 Base-N Encodings October 2006 (2) The final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by two "=" padding characters.

(3) The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one "=" padding character. 5. Base 64 Encoding with URL and Filename Safe Alphabet The Base 64 encoding with an URL and filename safe alphabet has been used in [12]. An alternative alphabet has been suggested that would use "~" as the 63rd character. Since the "~" character has special meaning in some file system environments, the encoding described in this section is recommended instead.

The remaining unreserved URI character is ".", but some file system environments do not permit multiple "." in a filename, thus making the "." character unattractive as well. The pad character "=" is typically percent-encoded when used in an URI [9], but if the data length is known implicitly, this can be avoided by skipping the padding; see section 3.2. This encoding may be referred to as "base64url". This encoding should not be regarded as the same as the "base64" encoding and should not be referred to as only "base64". Unless clarified otherwise, "base64" refers to the base 64 in the previous section.

This encoding is technically identical to the previous one, except for the 62:nd and 63:rd alphabet character, as indicated in Table 2. Josefsson Standards Track [Page 7] RFC 4648 Base-N Encodings October 2006 Table 2: The "URL and Filename safe" Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 - (minus) 12 M 29 d 46 u 63 _ 13 N 30 e 47 v (underline) 14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y (pad) = 6. Base 32 Encoding The following description of base 32 is derived from [11] (with corrections). This encoding may be referred to as "base32". The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but that need not be human readable.

A 33-character subset of US-ASCII is used, enabling 5 bits to be represented per printable character. (The extra 33rd character, "=", is used to signify a special processing function.) The encoding process represents 40-bit groups of input bits as output strings of 8 encoded characters. Proceeding from left to right, a 40-bit input group is formed by concatenating 5 8bit input groups. These 40 bits are then treated as 8 concatenated 5-bit groups, each of which is translated into a single character in the base 32 alphabet. When a bit stream is encoded via the base 32 encoding, the bit stream must be presumed to be ordered with the most-significant- bit first.

That is, the first bit in the stream will be the high- order bit in the first 8bit byte, the eighth bit will be the low- order bit in the first 8bit byte, and so on. Josefsson Standards Track [Page 8] RFC 4648 Base-N Encodings October 2006 Each 5-bit group is used as an index into an array of 32 printable characters. The character referenced by the index is placed in the output string. These characters, identified in Table 3, below, are selected from US-ASCII digits and uppercase letters. Table 3: The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y (pad) = 7 H 16 Q 25 Z 8 I 17 R 26 2 Special processing is performed if fewer than 40 bits are available at the end of the data being encoded.

A full encoding quantum is always completed at the end of a body. When fewer than 40 input bits are available in an input group, bits with value zero are added (on the right) to form an integral number of 5-bit groups. Padding at the end of the data is performed using the "=" character. Since all base 32 input is an integral number of octets, only the following cases can arise: (1) The final quantum of encoding input is an integral multiple of 40 bits; here, the final unit of encoded output will be an integral multiple of 8 characters with no "=" padding. (2) The final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by six "=" padding characters.

(3) The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be four characters followed by four "=" padding characters. (4) The final quantum of encoding input is exactly 24 bits; here, the final unit of encoded output will be five characters followed by three "=" padding characters. (5) The final quantum of encoding input is exactly 32 bits; here, the final unit of encoded output will be seven characters followed by one "=" padding character. Josefsson Standards Track [Page 9] RFC 4648 Base-N Encodings October 2006 7. Base 32 Encoding with Extended Hex Alphabet The following description of base 32 is derived from [7].

This encoding may be referred to as "base32hex". This encoding should not be regarded as the same as the "base32" encoding and should not be referred to as only "base32". This encoding is used by, e.g., NextSECure3 (NSEC3) [10]. One property with this alphabet, which the base64 and base32 alphabets lack, is that encoded data maintains its sort order when the encoded data is compared bit-wise. This encoding is identical to the previous one, except for the alphabet.

The new alphabet is found in Table 4. Table 4: The "Extended Hex" Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 9 9 18 I 27 R 1 1 10 A 19 J 28 S 2 2 11 B 20 K 29 T 3 3 12 C 21 L 30 U 4 4 13 D 22 M 31 V 5 5 14 E 23 N 6 6 15 F 24 O (pad) = 7 7 16 G 25 P 8 8 17 H 26 Q 8. Base 16 Encoding The following description is original but analogous to previous descriptions. Essentially, Base 16 encoding is the standard case- insensitive hex encoding and may be referred to as "base16" or "hex". A 16-character subset of US-ASCII is used, enabling 4 bits to be represented per printable character.

The encoding process represents 8-bit groups (octets) of input bits as output strings of 2 encoded characters. Proceeding from left to right, an 8-bit input is taken from the input data. These 8 bits are then treated as 2 concatenated 4-bit groups, each of which is translated into a single character in the base 16 alphabet. Each 4-bit group is used as an index into an array of 16 printable characters. The character referenced by the index is placed in the output string.

Josefsson Standards Track [Page 10] RFC 4648 Base-N Encodings October 2006 Table 5: The Base 16 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 4 4 8 8 12 C 1 1 5 5 9 9 13 D 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F Unlike base 32 and base 64, no special padding is necessary since a full code word is always available. 9. Illustrations and Examples To translate between binary and a base encoding, the input is stored in a structure, and the output is extracted. The case for base 64 is displayed in the following figure, borrowed from [5]. +--first octet--+-second octet--+--third octet--+ |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +-----------+---+-------+-------+---+-----------+ |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4.

Furthermore, when dealing with modern authentication models such as JSON Web Tokens (JWT), developers must understand that the payload of a standard JWT is merely Base64Url encoded. The token signature verifies authenticity, but the payload itself is completely readable by anyone who intercept the string. Storing PII within a standard JWT is a severe breach of zero-trust policies.