BaseX
A Base8/Base32/Base64 encoder/decoder implemented in C for efficient file encoding and decoding.
Status
Completed
Started
2023
Last Updated
2024
Primary Language
C
Overview
BaseX is a command-line tool for encoding and decoding files using the Base8, Base32, and Base64 formats. It provides a straightforward interface for transforming file content into these formats or decoding them back to their original state.
Problem Statement
Encoding and decoding between binary data and text representations is a common task in many applications, but existing libraries often prioritize flexibility over performance or have unnecessary dependencies.
Solution
BaseX provides a lightweight, dependency-free implementation that uses efficient algorithms for maximum performance. The library is designed to be easily integrated into other projects while maintaining high standards of efficiency and accuracy.
Key Implementation
Base64 Encoding Function
This function encodes binary data to Base64 format.
void base64_encode(const unsigned char *data, size_t input_length, char *encoded_data) {
static const char encoding_table[] = {
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',
'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',
'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
'w', 'x', 'y', 'z', '0', '1', '2', '3',
'4', '5', '6', '7', '8', '9', '+', '/'
};
size_t output_length = 4 * ((input_length + 2) / 3);
for (size_t i = 0, j = 0; i < input_length;) {
uint32_t octet_a = i < input_length ? data[i++] : 0;
uint32_t octet_b = i < input_length ? data[i++] : 0;
uint32_t octet_c = i < input_length ? data[i++] : 0;
uint32_t triple = (octet_a << 16) + (octet_b << 8) + octet_c;
encoded_data[j++] = encoding_table[(triple >> 18) & 0x3F];
encoded_data[j++] = encoding_table[(triple >> 12) & 0x3F];
encoded_data[j++] = encoding_table[(triple >> 6) & 0x3F];
encoded_data[j++] = encoding_table[triple & 0x3F];
}
// Add padding if necessary
for (size_t i = 0; i < mod_table[input_length % 3]; i++)
encoded_data[output_length - 1 - i] = '=';
}
Base64 Decoding Function
This function decodes Base64 data back to binary format.
int base64_decode(const char *data, size_t input_length, unsigned char *decoded_data) {
static const unsigned char decoding_table[256] = {0};
static int table_initialized = 0;
if (!table_initialized) {
for (int i = 0; i < 64; i++)
decoding_table[(unsigned char) encoding_table[i]] = i;
table_initialized = 1;
}
if (input_length % 4 != 0) return -1;
size_t output_length = input_length / 4 * 3;
if (data[input_length - 1] == '=') output_length--;
if (data[input_length - 2] == '=') output_length--;
for (size_t i = 0, j = 0; i < input_length;) {
uint32_t sextet_a = data[i] == '=' ? 0 & i++ : decoding_table[data[i++]];
uint32_t sextet_b = data[i] == '=' ? 0 & i++ : decoding_table[data[i++]];
uint32_t sextet_c = data[i] == '=' ? 0 & i++ : decoding_table[data[i++]];
uint32_t sextet_d = data[i] == '=' ? 0 & i++ : decoding_table[data[i++]];
uint32_t triple = (sextet_a << 18) + (sextet_b << 12) + (sextet_c << 6) + sextet_d;
if (j < output_length) decoded_data[j++] = (triple >> 16) & 0xFF;
if (j < output_length) decoded_data[j++] = (triple >> 8) & 0xFF;
if (j < output_length) decoded_data[j++] = triple & 0xFF;
}
return output_length;
}
Challenges & Learnings
The main challenge was implementing the encoding and decoding algorithms efficiently. Another challenge was ensuring the code remained memory-efficient while handling variable-length input data.
Future Improvements
- Add support for more encoding schemes like Base85
- Implement streaming encode/decode for large data
- Create bindings for other programming languages
- Add comprehensive test suite for edge cases
- Optimize further for specific CPU architectures