Five Small Models Support Tool Calling

Why this is here: Google’s Gemma-4-E2B-it model can run in under 1.5 GB of memory while still producing valuable outputs, thanks to its Per-Layer Embeddings architecture.

Matthew Mayo of KDnuggets highlights five small language models that enable agentic tool calling. These models—SmolLM3-3B, Qwen3-4B-Instruct-2507, Phi-3-mini-4k-instruct, Gemma-4-E2B-it, and Mistral-7B-Instruct-v0.3—offer structured tool calling without requiring extensive hardware.

SmolLM3, developed by Hugging Face, supports dual-mode reasoning and multiple languages. Alibaba’s Qwen3-4B-Instruct-2507 prioritizes low latency for applications like chatbots.

Microsoft’s Phi-3-Mini-4K-Instruct runs on-device, rivaling larger models in some benchmarks, and has a permissive MIT license. Google’s Gemma-4-E2B-it uses a hybrid attention mechanism for efficient processing on edge devices. Mistral-7B-Instruct-v0.3, the largest at 7B parameters, offers strong general instruction following.

The author notes this list represents models with which he has direct experience. Further development continues as the field rapidly evolves, and other capable models exist beyond this selection.