<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Retrieval on Rauf Ibishov</title><link>http://raufibishov.com/tags/retrieval/</link><description>Recent content in Retrieval on Rauf Ibishov</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>© 2026 Rauf Ibishov</copyright><lastBuildDate>Wed, 05 Mar 2025 10:00:00 +0000</lastBuildDate><atom:link href="http://raufibishov.com/tags/retrieval/index.xml" rel="self" type="application/rss+xml"/><item><title>Re-ranking LLMs in Production: Benchmarking Latency vs. Precision</title><link>http://raufibishov.com/posts/reranking-benchmarks/</link><pubDate>Wed, 05 Mar 2025 10:00:00 +0000</pubDate><guid>http://raufibishov.com/posts/reranking-benchmarks/</guid><description>&lt;p&gt;&lt;em&gt;Key question: does the re-ranking precision gain justify the latency cost at each pipeline stage? On a 100ms end-to-end SLA the answer is yes — but only with a quantized in-domain re-ranker, not a stock cross-encoder.&lt;/em&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Context: Why Re-rank at All?
 &lt;div id="context-why-re-rank-at-all" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#context-why-re-rank-at-all" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;First-stage retrieval (BM25 or dense vectors) is built for speed: you need top-1000 candidates
fast. Precision at the very top (position 1–5) is a secondary concern.&lt;/p&gt;</description></item><item><title>Hybrid Retrieval: Combining BM25 and Dense Vectors for Production Search</title><link>http://raufibishov.com/posts/hybrid-retrieval/</link><pubDate>Wed, 15 Jan 2025 10:00:00 +0000</pubDate><guid>http://raufibishov.com/posts/hybrid-retrieval/</guid><description>&lt;p&gt;&lt;em&gt;Key insight: pure BM25 misses semantic matches; pure dense vectors miss exact keywords. Hybrid wins on both MRR@10 and NDCG@5 — and the score-fusion choice (RRF over linear interpolation) matters as much as the model choice.&lt;/em&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why Pure BM25 (or Pure Dense) Isn&amp;rsquo;t Enough
 &lt;div id="why-pure-bm25-or-pure-dense-isnt-enough" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-pure-bm25-or-pure-dense-isnt-enough" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;BM25 is fast, interpretable, and great at exact keyword matching. Dense retrieval (SBERT, DPR) captures
semantic similarity but misses precise term overlap. In practice, neither alone covers the full
distribution of user queries — especially in a domain-specific corpus.&lt;/p&gt;</description></item></channel></rss>