Re-ranking LLMs in Production: Benchmarking Latency vs. Precision5 March 2025·3 minsRe-Ranking Cross-Encoder LLM Benchmarking RAG
Shrinking Transformers for Production: ONNX Export + Dynamic Quantization10 February 2025·3 minsONNX Quantization Model Optimization DistilBERT Inference
Hybrid Retrieval: Combining BM25 and Dense Vectors for Production Search15 January 2025·2 minsNLP Information Retrieval SBERT BM25 RAG