ServiceNow AI Research Lab has introduced Apriel-1.5-15B-Thinker, a groundbreaking 15-billion-parameter open-weights multimodal reasoning model, setting new benchmarks in cost-efficiency and performance. This model, trained using a data-centric mid-training recipe, achieves an Artificial Analysis Intelligence Index (AAI) score of 52, matching the performance of DeepSeek-R1-0528 while being significantly smaller. The model’s checkpoint is available under an MIT license on Hugging Face.
Frontier-Level Performance at a Fraction of the Cost
Apriel-1.5-15B-Thinker’s AAI score of 52 is a testament to its exceptional performance across a range of tasks. The AAI metric aggregates results from 10 third-party evaluations, including MMLU-Pro, GPQA Diamond, Humanity’s Last Exam, and others, providing a comprehensive measure of the model’s capabilities. This score is particularly impressive given the model’s size and the training method used.
The model’s performance is not limited to theoretical benchmarks. It demonstrates practical utility in various domains, such as math, coding, science, and tool use. For instance, it achieves an 87.5-88% pass rate on the American Invitational Mathematics Examination 2025 (AIME 2025), and scores competitively on other tasks like GPQA Diamond, IFBench, and LiveCodeBench.
Single-GPU Deployability: A Game Changer for Enterprises
One of the most significant advantages of Apriel-1.5-15B-Thinker is its single-GPU deployability. Unlike many large language models that require substantial computational resources, this model can fit on a single GPU. This feature targets on-premises and air-gapped deployments, making it an attractive option for enterprises with fixed memory and latency budgets.
Open Weights and Reproducible Pipeline: Transparency and Trust
ServiceNow AI has made the model’s weights, training recipe, and evaluation protocol publicly available. This transparency allows for independent verification and encourages further research and development. The open-weights approach also fosters collaboration and innovation in the AI community.
Training Mechanism: Base and Upscaling, Continual Pretraining, and Supervised Fine-Tuning
Apriel-1.5-15B-Thinker’s training process begins with Mistral’s Pixtral-12B-Base-2409 multimodal decoder-vision stack. The research team then applies depth upscaling, increasing decoder layers from 40 to 48, and realigns the vision encoder with the enlarged decoder. This approach avoids the need for pretraining from scratch while preserving the model’s single-GPU deployability.
The model undergoes two stages of continual pretraining (CPT). The first stage involves mixed text and image data to build foundational reasoning and document/diagram understanding. The second stage focuses on targeted synthetic visual tasks to sharpen spatial and compositional reasoning. Sequence lengths extend to 32k and 16k tokens, respectively, with selective loss placement on response tokens for instruction-formatted samples.
Following CPT, the model undergoes supervised fine-tuning (SFT) using high-quality, reasoning-trace instruction data. This process involves two additional SFT runs, with the final checkpoint being a weight merge of these runs. Notably, the training process does not involve reinforcement learning (RL) or reinforcement learning from AI feedback (RLAIF).
Data Note
Approximately 25% of the data used in the depth-upscaling text mix comes from NVIDIA’s Nemotron collection.
**Apriel-1.5-15B-Thinker: A Practical Baseline for Enterprises**
Apriel-1.5-15B-Thinker’s performance and cost-efficiency make it a practical baseline for enterprises evaluating open-weights reasoners. Its open weights, reproducible recipe, and single-GPU latency make it an attractive option for those considering larger closed systems. The model’s availability on Hugging Face, along with its detailed model card and evaluation protocol, makes it easy for enterprises to assess its suitability for their specific needs.
In conclusion, ServiceNow AI’s Apriel-1.5-15B-Thinker is not just a new multimodal reasoning model; it’s a testament to the potential of careful mid-training techniques in delivering high performance at a fraction of the cost. Its open-weights approach, single-GPU deployability, and impressive performance across a range of tasks make it a compelling choice for enterprises and researchers alike. As AI continues to evolve, models like Apriel-1.5-15B-Thinker will play a crucial role in shaping the future of multimodal reasoning.