↓ Skip to main content

Projects

Selected work across NLP infrastructure, search, and Azerbaijani-language modeling. The current focus is the AzBERT pipeline — a 64k tokenizer plus a NeoBERT-style encoder trained from scratch for Azerbaijani — alongside the production legal-search system at NAIC.

2026

AzNEOBERT — Azerbaijani BERT from Scratch on 12B Tokens

1 January 2026

Python PyTorch DeepSpeed HuggingFace Accelerate Flash Attention SLURM Azerbaijani Pretraining NLP

2025

Azerbaijani Tokenizer — Three Algorithms, 64k Vocab, 1.727 Fertility

1 December 2025

Python HuggingFace Tokenizers SentencePiece MongoDB NLP Azerbaijani Pretraining

e-qanun.ai - AI-Powered Legal Search for Azerbaijani Law

25 September 2025

Python PyTorch ONNX Elasticsearch Vespa SBERT DistilBERT FastAPI FAISS