Toolchain for exporting DistilBERT and SBERT to ONNX, then applying dynamic quantization and mixed-precision techniques.
Achieved 39% model size reduction with measurable inference speedup and minimal accuracy degradation on downstream NLP benchmarks.