BaseX

A Base8/Base32/Base64 encoder/decoder implemented in C for efficient file encoding and decoding.

C
Encoding
Decoding
Base64
Base32
Base8

Status

Completed

Started

2023

Last Updated

2024

Primary Language

C

Overview

BaseX is a command-line tool for encoding and decoding files using the Base8, Base32, and Base64 formats. It provides a straightforward interface for transforming file content into these formats or decoding them back to their original state.

Problem Statement

Encoding and decoding between binary data and text representations is a common task in many applications, but existing libraries often prioritize flexibility over performance or have unnecessary dependencies.

Solution

BaseX provides a lightweight, dependency-free implementation that uses efficient algorithms for maximum performance. The library is designed to be easily integrated into other projects while maintaining high standards of efficiency and accuracy.

Key Implementation

Base64 Encoding Function

This function encodes binary data to Base64 format.

c
void base64_encode(const unsigned char *data, size_t input_length, char *encoded_data) {
    static const char encoding_table[] = {
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
        'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',
        'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
        'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',
        'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
        'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
        'w', 'x', 'y', 'z', '0', '1', '2', '3',
        '4', '5', '6', '7', '8', '9', '+', '/'
    };
    
    size_t output_length = 4 * ((input_length + 2) / 3);
    
    for (size_t i = 0, j = 0; i < input_length;) {
        uint32_t octet_a = i < input_length ? data[i++] : 0;
        uint32_t octet_b = i < input_length ? data[i++] : 0;
        uint32_t octet_c = i < input_length ? data[i++] : 0;
        
        uint32_t triple = (octet_a << 16) + (octet_b << 8) + octet_c;
        
        encoded_data[j++] = encoding_table[(triple >> 18) & 0x3F];
        encoded_data[j++] = encoding_table[(triple >> 12) & 0x3F];
        encoded_data[j++] = encoding_table[(triple >> 6) & 0x3F];
        encoded_data[j++] = encoding_table[triple & 0x3F];
    }
    
    // Add padding if necessary
    for (size_t i = 0; i < mod_table[input_length % 3]; i++)
        encoded_data[output_length - 1 - i] = '=';
}

Base64 Decoding Function

This function decodes Base64 data back to binary format.

c
int base64_decode(const char *data, size_t input_length, unsigned char *decoded_data) {
    static const unsigned char decoding_table[256] = {0};
    static int table_initialized = 0;
    
    if (!table_initialized) {
        for (int i = 0; i < 64; i++)
            decoding_table[(unsigned char) encoding_table[i]] = i;
        table_initialized = 1;
    }
    
    if (input_length % 4 != 0) return -1;
    
    size_t output_length = input_length / 4 * 3;
    if (data[input_length - 1] == '=') output_length--;
    if (data[input_length - 2] == '=') output_length--;
    
    for (size_t i = 0, j = 0; i < input_length;) {
        uint32_t sextet_a = data[i] == '=' ? 0 & i++ : decoding_table[data[i++]];
        uint32_t sextet_b = data[i] == '=' ? 0 & i++ : decoding_table[data[i++]];
        uint32_t sextet_c = data[i] == '=' ? 0 & i++ : decoding_table[data[i++]];
        uint32_t sextet_d = data[i] == '=' ? 0 & i++ : decoding_table[data[i++]];
        
        uint32_t triple = (sextet_a << 18) + (sextet_b << 12) + (sextet_c << 6) + sextet_d;
        
        if (j < output_length) decoded_data[j++] = (triple >> 16) & 0xFF;
        if (j < output_length) decoded_data[j++] = (triple >> 8) & 0xFF;
        if (j < output_length) decoded_data[j++] = triple & 0xFF;
    }
    
    return output_length;
}

Challenges & Learnings

The main challenge was implementing the encoding and decoding algorithms efficiently. Another challenge was ensuring the code remained memory-efficient while handling variable-length input data.

Future Improvements

  • Add support for more encoding schemes like Base85
  • Implement streaming encode/decode for large data
  • Create bindings for other programming languages
  • Add comprehensive test suite for edge cases
  • Optimize further for specific CPU architectures