Train & Fine-Tune Models

9x Model Compression for Edge

INT8 quantization-aware training compressing MobileNetV2 from 23.5MB to 2.6MB with only 3.8% accuracy drop.

The problem

Models that perform well in the lab are too large to run on edge and mobile hardware without aggressive compression.

The model that works great on your laptop won't fit on the target device at all.
Naive post-training quantization tanks accuracy and nobody can say why.
Every "make it smaller" pass risks breaking the exact behavior users rely on.

What NEO built

NEO ran INT8 quantization-aware training on MobileNetV2 using TensorFlow Lite, with a calibration dataset and augmentation to preserve accuracy through compression.

TensorFlow LiteINT8 quantizationMobileNetV2

The result

9x model compression

Compressed the model 9x (23.5MB → 2.6MB) with only a 3.8% accuracy drop, ready for mobile and IoT deployment.

From the blog · 8 min

9x Model Compression with Quantization-Aware Training for Edge Deployment

How we compressed MobileNetV2 from 23.5MB to 2.6MB using INT8 quantization-aware training, achieving 9x size reduction with only a 3.8% accuracy drop, ready for deployment on Android, iOS, and Raspberry Pi.

Try this in your workspace

Paste this into NEO chat to kick off the same workflow on your own data.

NEO chat

Run quantization-aware training on my model to compress it for edge deployment, and tell me how much accuracy I'm actually trading away.

Paste it in · review the plan · get the diff

Get NEO

9x Model Compression for Edge

9x Model Compression with Quantization-Aware Training for Edge Deployment

Try this in your workspace

More Train & Fine-Tune Models use cases

Medical Reasoning via LoRA Fine-Tuning

Carbon-Aware Training Scheduler