ONNX Model Optimization Pipeline

Toolchain for exporting DistilBERT and SBERT to ONNX, then applying dynamic quantization and mixed-precision techniques.

Achieved 39% model size reduction with measurable inference speedup and minimal accuracy degradation on downstream NLP benchmarks.

Links
#