AzNEOBERT — Azerbaijani BERT from Scratch on 12B Tokens1 January 2026Python PyTorch DeepSpeed HuggingFace Accelerate Flash Attention SLURM Azerbaijani Pretraining NLP
Azerbaijani Tokenizer — Three Algorithms, 64k Vocab, 1.727 Fertility1 December 2025Python HuggingFace Tokenizers SentencePiece MongoDB NLP Azerbaijani Pretraining
Hybrid Retrieval: Combining BM25 and Dense Vectors for Production Search15 January 2025·3 minsNLP Information Retrieval Retrieval SBERT BM25 RAG