Logo

0x3d.Site

is designed for aggregating information.
Welcome
check repository here

Whatlang - rust library for natural language detection

Whatlang

Natural language detection for Rust with focus on simplicity and performance.

Try online demo.

Build Status License Documentation

Stand With Ukraine

Content

Features

  • Supports 69 languages
  • 100% written in Rust
  • Lightweight, fast and simple
  • Recognizes not only a language, but also a script (Latin, Cyrillic, etc)
  • Provides reliability information

Get started

Example:

use whatlang::{detect, Lang, Script};

fn main() {
    let text = "Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!";

    let info = detect(text).unwrap();
    assert_eq!(info.lang(), Lang::Epo);
    assert_eq!(info.script(), Script::Latin);
    assert_eq!(info.confidence(), 1.0);
    assert!(info.is_reliable());
}

For more details (e.g. how to blacklist some languages) please check the documentation.

Who uses Whatlang?

Whatlang is used within the following big projects as direct or indirect dependency for language recognition. You're gonna be in a great company using Whatlang:

  • Sonic - fast, lightweight and schema-less search backend in Rust.
  • Meilisearch - an open-source, easy-to-use, blazingly fast, and hyper-relevant search engine built in Rust.

Feature toggles

FeatureDescription
enum-mapLang and Script implement Enum trait from enum-map
arbitrarySupport Arbitrary
serdeImplements Serialize and Deserialize for Lang and Script
devEnables whatlang::dev module which provides some internal API.
It exists for profiling purposes and normal users are discouraged to to rely on this API.

How does it work?

How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.

How is is_reliable calculated?

It is based on the following factors:

  • How many unique trigrams are in the given text
  • How big is the difference between the first and the second(not returned) detected languages? This metric is called rate in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:

Language recognition whatlang rust

For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.

Make tasks

  • make bench - run performance benchmarks
  • make doc - generate and open doc
  • make test - run tests
  • make watch - watch changes and run tests

Comparison with alternatives

WhatlangCLD2CLD3
Implementation languageRustC++C++
Languages6883107
Algorithmtrigramsquadgramsneural network
Supported EncodingUTF-8UTF-8?
HTML supportnoyes?

Ports and clones

Donations

You can support the project by donating NEAR tokens.

Our NEAR wallet address is whatlang.near

Derivation

Whatlang is a derivative work from Franc (JavaScript, MIT) by Titus Wormer.

License

MIT © Sergey Potapov

Contributors

Rust
Rust
Rust is a modern programming language focused on safety, speed, and concurrency. It prevents common bugs like null pointer dereferencing and data races, making it ideal for system programming and high-performance applications.
Create business apps like assembling blocks | ILLA Cloud
Create business apps like assembling blocks | ILLA Cloud
GitHub - 0x59616e/SteinsOS: An operating system written in Rust
GitHub - 0x59616e/SteinsOS: An operating system written in Rust
GitHub - j0ru/kickoff: Minimalistic program launcher
GitHub - j0ru/kickoff: Minimalistic program launcher
Workflow runs · rust-lang/rustup
Workflow runs · rust-lang/rustup
GitHub - paradigmxyz/artemis: A simple, modular, and fast framework for writing MEV bots in Rust.
GitHub - paradigmxyz/artemis: A simple, modular, and fast framework for writing MEV bots in Rust.
Workflow runs · sigp/lighthouse
Workflow runs · sigp/lighthouse
GitHub - flox/flox: Developer environments you can take with you
GitHub - flox/flox: Developer environments you can take with you
Unified Architecture - OPC Foundation
Unified Architecture - OPC Foundation
MaidSafe
MaidSafe
ttyperacer / terminal-typeracer · GitLab
ttyperacer / terminal-typeracer · GitLab
GitHub - cloudhead/rx: 👾 Modern and minimalist pixel editor
GitHub - cloudhead/rx: 👾 Modern and minimalist pixel editor
GitHub - rust-ethereum/ethabi: Encode and decode smart contract invocations
GitHub - rust-ethereum/ethabi: Encode and decode smart contract invocations
GitHub - watchexec/watchexec: Executes commands in response to file modifications
GitHub - watchexec/watchexec: Executes commands in response to file modifications
GitHub - osa1/tiny: A terminal IRC client
GitHub - osa1/tiny: A terminal IRC client
GitHub - sergree/whatbpm: 💓 Today's Trending Values for EDM Production
GitHub - sergree/whatbpm: 💓 Today's Trending Values for EDM Production
GitHub - rsaarelm/magog: A roguelike game in Rust
GitHub - rsaarelm/magog: A roguelike game in Rust
GitHub - autonomys/subspace: Subspace Network reference implementation
GitHub - autonomys/subspace: Subspace Network reference implementation
GitHub - nicohman/eidolon: Provides a single TUI-based registry for drm-free, wine and steam games on linux, accessed through a rofi launch menu.
GitHub - nicohman/eidolon: Provides a single TUI-based registry for drm-free, wine and steam games on linux, accessed through a rofi launch menu.
GitHub - shshemi/tabiew: A lightweight, terminal-based application to view and query delimiter separated value formatted documents, such as CSV or TSV files.
GitHub - shshemi/tabiew: A lightweight, terminal-based application to view and query delimiter separated value formatted documents, such as CSV or TSV files.
GitHub - joamag/boytacean: A GB emulator that is written in Rust 🦀!
GitHub - joamag/boytacean: A GB emulator that is written in Rust 🦀!
GitHub - chaosprint/glicol: Graph-oriented live coding language and music/audio DSP library written in Rust
GitHub - chaosprint/glicol: Graph-oriented live coding language and music/audio DSP library written in Rust
GitHub - wasmerio/winterjs: Winter is coming... ❄️
GitHub - wasmerio/winterjs: Winter is coming... ❄️
GitHub - withoutboats/notty: A new kind of terminal
GitHub - withoutboats/notty: A new kind of terminal
GitHub - cfal/shoes: A multi-protocol proxy server written in Rust (HTTP, HTTPS, SOCKS5, Vmess, Vless, Shadowsocks, Trojan, Snell)
GitHub - cfal/shoes: A multi-protocol proxy server written in Rust (HTTP, HTTPS, SOCKS5, Vmess, Vless, Shadowsocks, Trojan, Snell)
GitHub - eigerco/beerus: A stateless trustless Starknet light client in Rust 🦀
GitHub - eigerco/beerus: A stateless trustless Starknet light client in Rust 🦀
GitHub - quilt/etk: evm toolkit
GitHub - quilt/etk: evm toolkit
GitHub - pop-os/system76-power: System76 Power Management
GitHub - pop-os/system76-power: System76 Power Management
GitHub - Limeth/ethaddrgen: Custom Ethereum vanity address generator made in Rust
GitHub - Limeth/ethaddrgen: Custom Ethereum vanity address generator made in Rust
The Tor Project / Core / Arti · GitLab
The Tor Project / Core / Arti · GitLab
Production
Production
Rust
More on Rust

Programming Tips & Tricks

Code smarter, not harder—insider tips and tricks for developers.

Error Solutions

Turn frustration into progress—fix errors faster than ever.

Shortcuts

The art of speed—shortcuts to supercharge your workflow.
  1. Collections 😎
  2. Frequently Asked Question's 🤯

Tools

available to use.

Made with ❤️

to provide resources in various ares.