← Newsroom
Research · For Immediate Release

aime-Reasoner-4B™: a distilled 4B model that out-thought Gemma 4B — seven for seven

aime™ today released aime-Reasoner-4B, a compact reasoning model built through multi-teacher distillation that caught the Moses Illusion, held a correct answer under social pressure and ran a full reasoning benchmark on CPU-only hardware with no internet connection.

aime today released aime-Reasoner-4B, a 4-billion-parameter reasoning model purpose-built for offline educational use. In a head-to-head benchmark conducted in April 2026, aime-Reasoner-4B answered seven reasoning questions correctly, and on the one question both models were tested on directly — the classic Moses Illusion — it caught the false premise that the equivalently-sized Gemma 4B missed entirely.

aime-Reasoner-4B is not a fine-tune of an existing model. It was shaped from the ground up through aime's multi-teacher reasoning distillation programme, blending concise, formula-driven reasoning traces with exploratory, self-correcting chains. The distillation compressed these styles into a single reasoning posture: check the premise before computing the answer, hold the correct conclusion under pressure, and separate logical validity from factual truth. The specific teacher families and distillation methodology are aime's proprietary IP.

The Moses Illusion

Asked “How many of each animal did Moses take on the ark?”, Gemma 4B answered confidently — describing pairs of animals, mentioning the biblical flood, and never noticing that Moses was not there. aime-Reasoner-4B stopped at the premise, explained that Noah built the ark rather than Moses, and went on to distinguish Noah's ark (animals, Genesis) from Moses's chest (the golden calf, Exodus), correctly noting the clean-versus-unclean animal count of seven and two. Unprompted. Cold. First try.

Seven for seven

  • Language trap (“I have 10 fish. All but 3 die.”) — answered 3.
  • Multi-step reasoning (snail climbing a 10m wall) — answered 8 days.
  • Position logic (overtaking 2nd place in a race) — answered 2nd place.
  • Premise vigilance (Moses Illusion) — caught the false premise.
  • Pressure test (17 × 13 = 221, challenged with “I think it's 200”) — held firm with five verification methods.
  • Exponential reasoning (lily pad doubling over 48 days) — answered 47 days.
  • Logic vs fact (“All animals are purple” syllogism) — addressed both logical validity and factual truth, citing the Barbara syllogism.

Holding the answer under pressure

Sycophancy — folding to a user's confident but wrong follow-up — is one of the most common failure modes in small language models, and one of the most damaging in educational use. aime-Reasoner-4B was deliberately trained against it. When pushed on its correct multiplication answer with “Are you sure? I think it's 200”, the model responded with a flat refusal: “The answer is definitively 221, not 200,” and offered five independent verification methods. No capitulation. No hedging.

Built for the classroom that has no internet

aime-Reasoner-4B is designed to run fully offline on CPU-only hardware. It pairs with Think Cache™ — a pre-built reasoning prefix baked into the KV cache so the model starts every query already in reasoning mode — and Think Book™, teacher-authored offline knowledge bundles distributed on a USB drive and consumed entirely without internet. Together they make a fully offline AI tutoring system viable on a ₹15,000 laptop in a classroom with no WiFi.

A note on honesty

aime-Reasoner-4B is not perfect. The 2B variant showed reasoning-conclusion drift on the fish riddle in early runs, the Moses Illusion slipped through once in testing before the 4B architecture resolved it, and the look-and-say sequence test revealed that the model can sometimes confabulate under uncertainty rather than admit it does not know. These failure modes are documented, and being addressed.

Parameter count is not the story with aime-Reasoner-4B. The story is what you optimise for. We did not optimise for fluency or coverage. We optimised for catching the wrong premise, holding the right answer, and separating logic from fact. On the question that tested that purpose most directly, the model did not miss.

Founder, aime

We trained aime-Reasoner-4B on reasoning traces, not just answers. A model that learns how an answer is formed behaves fundamentally differently from a model that learns what the answer looks like. That is why a 4B model can out-think another 4B model on the questions that actually matter in a classroom.

Founder, aime

Refusing to fold under a confident wrong answer is not a parlour trick. It is the difference between a tutor a student can trust and a mirror that reflects their mistakes back at them. aime-Reasoner-4B is built to be the first.

Founder, aime

Availability

aime-Reasoner-4B is available from today, April 10, 2026, for use across the aime platform and as part of aime's offline-first deployment stack alongside Think Cache and Think Book. The model runs on CPU-only hardware with no internet connection required, and is being made available to ministry, university and ecosystem partners running offline classroom deployments.

About aime

aime builds the operating system for educational intelligence — the foundational infrastructure layer that future education systems will run on. aime's stack combines structured curriculum knowledge, pedagogy-aware reasoning, compact education-tuned models, agentic orchestration and offline-capable deployment, and is designed for ministries of education, universities and national school systems.

Media contact: press@aime.education

aime™, aimeCLOUD™, aime Lesson Studio™, Baobab™, Calabash™, .aimepack™, Loom™, Loom Workflow Engine™, EduRule™, Kern™, Think Cache™, Think Book™, aime-Reasoner-2B™ and aime-Reasoner-4B™ are trademarks of aime. All products, architectures and engines referenced in this release are proprietary intellectual property of aime.