Development of Multilingual LeetSpeak Encoder/Decoder App

Post Stastics

  • This post has 2935 words.
  • Estimated read time is 13.98 minute(s).

Introduction

As you may know I have been absent for a several weeks due to illness. As I am still recovering, I decided to do something simple for my first project during my recovery. I wanted a simple, fun project for my readers. So I decided on a multi-lingual LeekSpeak encoder/decoder. In this post we will create this app and play around with it.

About 1337 Encoding (LeetSpeak)

1337 Speak, also known as LeetSpeak or 1337, is a form of symbolic writing in which letters are replaced with a combination of numbers, symbols, and other characters. The term "leet" is derived from the word "elite," reflecting the language's origins in online communities and hacker culture.

Basic LeetSpeak Conversions

Here are some common LeetSpeak substitutions:

  • A -> 4
  • B -> 8
  • E -> 3
  • G -> 9
  • H -> |-|
  • I -> 1
  • L -> |
  • O -> 0
  • S -> 5
  • T -> 7
  • U -> |_|

These substitutions are often used to replace corresponding letters in words, creating a stylized and playful form of text.

Purpose and Usage

LeetSpeak is primarily used for fun, as a way to obfuscate text, or as a form of identity within certain online communities. It has gained popularity in gaming, programming, and internet subcultures.

The multilingual LeetSpeak encoder/decoder project presented here allows users to apply and reverse LeetSpeak transformations in various languages, adding an extra layer of creativity and customization to text manipulation.

Multilingual Character Replacement Module

multilang.py Overview

The multilang.py file serves as a module providing character replacement dictionaries for various languages, supporting text encoding and decoding. It includes language-specific dictionaries and language dictionary modules for decoding. Let's break down the code step by step.

Language Data Dictionary

language_data = {
'ar': (
    {'ا': '4', 'ب': '8', 'ت': '7', 'ث': '6', 'ج': '9', 'ح': '|-|', 'خ': 'x', 'د': '|)', 'ذ': '0', 'ر': '®',
     'ز': '2', 'س': '5', 'ش': '$', 'ص': '|_', 'ض': '|_', 'ط': '7', 'ظ': 'z', 'ع': '3', 'غ': '9', 'ف': '|*', 'ق': 'q',
     'ك': '|<', 'ل': '|', 'م': '|v|', 'ن': '|\\|', 'ه': '|-|', 'و': '0', 'ي': '1', 'ؤ': '|_|',
     'ة': 'h', 'و': 'w', 'ج': 'j', 'د': 'd', 'ت': 't', 'ك': 'k', 'ل': 'l', 'أ': 'a', 'ر': 'r', 'م': 'm', 'ي': 'y',
     'س': 's', 'ح': 'h', 'ف': 'f', 'ن': 'n', 'ء': '`', 'ق': 'q', 'ط': 't', 'ع': 'e', 'ه': 'h', 'ئ': '}', 'و': 'o',
     'ج': 'j', 'د': 'd', 'ة': 'a', 'و': 'w', 'ج': 'j', 'د': 'd', 'ت': 't', 'ك': 'k', 'ل': 'l', 'أ': 'a', 'ر': 'r',
     'م': 'm', 'ي': 'y', 'س': 's', 'ح': 'h', 'ف': 'f', 'ن': 'n', 'ء': '`', 'ق': 'q', 'ط': 't', 'ع': 'e', 'ه': 'h',
     'ئ': '}', 'ـ': ''},
    None),
    'da': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'de': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        'nltk'),
    'el': (
        {'Α': '4', 'Β': '8', 'Λ': '|', 'Ε': '3', 'Γ': '9', 'Η': '|-|', 'Ι': '1', 'Ο': '0', 'Σ': '5', 'Τ': '7', 'Υ': '|_|'},
        None),
    'en': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        'nltk'),
    'es': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        'nltk'),
    'fi': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'fr': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        'nltk'),
    'fr_ca': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        'nltk'),
    'he': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'hi': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'haw': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'id': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'iu': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'it': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        'nltk'),
    'ja': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        'nltk'),
    'ko': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'ms': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'nl': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        'nltk'),
    'no': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'pl': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        'nltk'),
    'pt_br': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        'nltk'),
    'ro': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'ru': (
        {'А': '4', 'Б': '6', 'В': 'B', 'Г': 'r', 'Д': 'g', 'Е': 'e', 'Є': '3', 'Ж': 'ж', 'З': '3', 'И': 'u', 'І': 'i',
         'Ї': 'i', 'Й': 'й', 'К': 'k', 'Л': 'l', 'М': 'M', 'Н': 'H', 'О': '0', 'П': 'n', 'Р': 'p', 'С': 'c', 'Т': 'T',
         'У': 'y', 'Ф': 'ф', 'Х': 'x', 'Ц': 'u', 'Ч': '4', 'Ш': 'ш', 'Щ': 'щ', 'Ь': 'b', 'Ю': '10', 'Я': '9'}, 'nltk'),
    'sv': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'th': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'tr': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'uk': (
        {'А': '4', 'Б': '6', 'В': 'B', 'Г': 'r', 'Д': 'g', 'Е': 'e', 'Є': '3', 'Ж': 'ж', 'З': '3', 'И': 'u', 'І': 'i',
         'Ї': 'i', 'Й': 'й', 'К': 'k', 'Л': 'l', 'М': 'M', 'Н': 'H', 'О': '0', 'П': 'n', 'Р': 'p', 'С': 'c', 'Т': 'T',
         'У': 'y', 'Ф': 'ф', 'Х': 'x', 'Ц': 'u', 'Ч': '4', 'Ш': 'ш', 'Щ': 'щ', 'Ь': 'b', 'Ю': '10', 'Я': '9'}, 'nltk'),
    'vi': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        None),
    'zh': (
        {'A': '4', 'B': '8', 'L': '|', 'E': '3', 'G': '9', 'H': '|-|', 'I': '1', 'O': '0', 'S': '5', 'T': '7', 'U': '|_|'},
        'nltk'),
}

The language_data dictionary contains language identifiers as keys, each associated with a tuple. The tuple consists of a character replacement dictionary and a language dictionary module indicator (either None, 'nltk' or another dictionary module name) for decoding. If None is specified in place of a module name, the NLTK module will be used by default. At the moment the language dictionary module is not used. But will be used in a near-future version to aid in decoding encoded text.

get_character_replacement_dict Function

def get_character_replacement_dict(lang_id: str) -> dict:
    """
    Get the character replacement dictionary for a given language.

    Args:
        lang_id (str): Language identifier.

    Returns:
        dict: Character replacement dictionary for the specified language.

    Raises:
        ValueError: If the language ID is not supported.
    """
    if lang_id in language_data:
        char_replacement_dict, _ = language_data[lang_id]
        return char_replacement_dict
    else:
        raise ValueError(f"Unsupported language ID: '{lang_id}'.")

This function retrieves the character replacement dictionary for a given language identifier. If the language is not supported, it raises a ValueError.

create_reverse_dict Function

def create_reverse_dict(original_dict):
    return {v: k for k, v in original_dict.items()}

This function creates a reverse dictionary, mapping values to keys, which is useful for decoding LeetSpeak.

get_language_dictionary_module Function

def get_language_dictionary_module(lang_id: str) -> str:
    """
    Get the module to import for language dictionary based on the language identifier.

    Args:
        lang_id (str): Language identifier.

    Returns:
        str: Module to import for language dictionary.

    Raises:
        ValueError: If the language ID is not supported.
    """
    if lang_id in language_data:
        _, module_to_import = language_data[lang_id]

        if module_to_import == 'custom':
            print(f"Warning: The language '{lang_id}' requires a custom language dictionary for decoding.")
            print("Defaulting to language dictionary NLTK for decoding.")
            module_to_import = 'nltk'
    else:
        raise ValueError(f"Unsupported language ID: '{lang_id}'.")

    return module_to_import

This function retrieves the module to import for language dictionary based on the language identifier. It also provides a default option (NLTK) if no module is specified as is designated by an entry of None.

decode_leet Function

def decode_leet(cypher_text: str, lang_id: str = 'en'):
    """
    Decode leet speak text into its original alphabetical form.

    Args:
        cypher_text (str): Leet speak text to decode.
        lang_id (str): ISO language indicator.

    Returns:
        str: Decoded text.
    """
    dictionary = get_character_replacement_dict(lang_id)
    reverse_lookup_dict = create_reverse_dict(dictionary)
    decoded_text = ''

    i = 0
    while i < len(cypher_text):
        found_multi_char = False
        for multi_char in sorted(reverse_lookup_dict.keys(), key=len, reverse=True):
            if cypher_text[i:i + len(multi_char)].lower() == multi_char:
                decoded_text += reverse_lookup_dict[multi_char].lower()
                i += len(multi_char)
                found_multi_char = True
                break

        if not found_multi_char:
            char = cypher_text[i].lower()
            if char in reverse_lookup_dict:
                decoded_text += reverse_lookup_dict[char].lower()
            else:
                decoded_text += char
       i += 1

    decoded_text = decoded_text.capitalize()
    return decoded_text

This function decodes LeetSpeak text using the specified language's character replacement dictionary.

encode_leet Function

def encode_leet(input_text: str, lang_id: str='en'):
    """
    Encode the input text using Leet (1337) speak.

    Parameters:
    - input_text (str): The input text to be encoded.
    - lang_id (str): The language ID.

    Returns:
    - output_text (str): The Leet encoded text.
    """
    dictionary = get_character_replacement_dict(lang_id)
    output_text = ""

    for char in input_text:
        if char.upper() in dictionary:
            output_text += dictionary[char.upper()]
        else:
            output_text += char

    return output_text

This function encodes the input text into LeetSpeak using the specified language's character replacement dictionary.

Example Usage

The file includes an example of using the module in a main guard, demonstrating how to obtain character replacement dictionaries and language modules.

Leet Speak Encoder/Decoder Script

leetspeak.py Overview

The leetspeak.py script serves as the main application for LeetSpeak encoding and decoding. It utilizes the multilang module for character replacement dictionaries and functions.

display_info Function

def display_info(text):
    """
    Display information about the text, including length in words and characters.

    Args:
        text (str): Text to analyze.
    """
    word_count = len(text.split())
    char_count = len(text)
    print(f'Text Length: {char_count} characters, {word_count} words.')

This function displays information about the text, including word count and character count.

Main Function

def main():
    """
    Main function to handle command-line arguments and execute leet speak encoding/decoding.
    """
    parser = argparse.ArgumentParser(description='Leet Speak Encoder/Decoder')
    parser.add_argument('-e', '--encode', action='store_true', help='Encode clear text into leet speak')
    parser.add_argument('-d', '--decode', action='store_true', help='Decode leet speak text into clear text')
    parser.add_argument('-lang', '--language', default='en', help='ISO language indicator (default: en)')
    parser.add_argument('-m', '--message', help='Text string to encode or decode')
    parser.add_argument('-i', '--input', help='File name to read encoded/decoded text')
    parser.add_argument('-o', '--output', help='File name to save encoded/decoded text')
    parser.add_argument('-v', '--verbose', action='store_true', help='Display additional information about the text')
    args = parser.parse_args()

    language_id = args.language.lower()

    if args.message:
        input_text = args.message
    elif args.input:
        with

 open(args.input, 'r', encoding='utf-8') as file:
            input_text = file.read()
    else:
        raise ValueError("Please provide either a text string (-m) or an input file (-i).")

    if args.encode:
        output_text = encode_leet(input_text, language_id)
        print(f'Encoded Leet Speak Text: {output_text}')

        if args.verbose:
            display_info(output_text)

        if args.output:
            with open(args.output, 'w', encoding='utf-8') as file:
                file.write(output_text)
        else:
            print(f"Encoded Text: {output_text}")

    elif args.decode:
        decoded_text = decode_leet(input_text, language_id)

        if args.verbose:
            display_info(decoded_text)

        if args.output:
            with open(args.output, 'w', encoding='utf-8') as file:
                file.write(decoded_text)
        else:
            print(f"Decoded Text: {decoded_text}")

if __name__ == '__main__':
    main()

This script uses the argparse module to handle command-line arguments. It allows encoding or decoding LeetSpeak based on user input. The language, input text, and output options are customizable.

Conclusion

The multilingual LeetSpeak encoder/decoder application is a comprehensive solution for encoding and decoding text using LeetSpeak conventions. The modular design allows for easy addition of language support, and the script provides a user-friendly interface through command-line arguments.

The development process involved creating a flexible character replacement module, incorporating support for multiple languages, and designing a user-friendly script for encoding and decoding LeetSpeak. The final result is a robust and extensible application for language enthusiasts and those interested in text manipulation.

Resources

Repository Link

The complete source code for the multilingual LeetSpeak encoder/decoder project can be found on the GitHub repository:

GitHub Repository

Feel free to explore the code, contribute, or open issues if you have suggestions or encounter any problems.

Additional Reading

Leave a Reply

Your email address will not be published. Required fields are marked *