Blast Off! Karpathy's 'nanochat' Lets You Train Your Own ChatGPT in Just 4 Hours for Under $100!"

Blast Off! Karpathy’s ‘nanochat’ Lets You Train Your Own ChatGPT in Just 4 Hours for Under $100!”

🚀 Attention, AI enthusiasts! Andrej Karpathy has just dropped a game-changer: ‘nanochat’, a compact, easy-to-use pipeline that lets you train your own ChatGPT-style model in just 4 hours for around $100! 🤯

💻 What’s under the hood?
– A single script, ‘speedrun.sh’, that guides you through the entire process: tokenization, base pretraining, mid-training on chat/multiple-choice/tool-use data, Supervised Fine-Tuning (SFT), optional RL on GSM8K, evaluation, and serving (CLI + ChatGPT-like web UI).
– A recommended setup of an 8×H100 node, costing around $24/hour, making the 4-hour speedrun land near $100.
– A post-run report.md that summarizes metrics on CORE, ARC-E/C, MMLU, GSM8K, HumanEval, and ChatCORE.

🌟 Key Features:
– Tokenizer: Custom Rust BPE with a 65,536-token vocab.
– Model: A depth-20 Transformer with ≈560M params, trained for ~11.2B tokens.
– Mid-training & SFT: Adapts the base model to conversations, teaches multiple-choice behavior, and improves with higher-quality conversations.
– Tool use: End-to-end wired with a simple Python interpreter sandbox.
– Optional RL on GSM8K: A simplified GRPO loop for reinforcement learning.

💥 Scaling Up:
– Karpathy sketches two larger targets: a ~$300 tier (d=26, ~12 hours) that slightly outperforms GPT-2 CORE, and a ~$1,000 tier (~41.6 hours) for materially better coherence and reasoning.
– Previous experimental runs showed a d=30 model reaching impressive scores on MMLU, ARC-Easy, and GSM8K.

📈 Example Metrics (speedrun tier):
– CORE: 0.2219 (base) → 0.2219 (after mid-training/SFT)
– ARC-E: 0.3561 → 0.3876
– ARC-C: ~0.2875 → 0.2807
– MMLU: 0.3111 → 0.3151
– GSM8K: 0.0250 → 0.0455
– HumanEval: 0.0671 → 0.0854
– ChatCORE: 0.0730 → 0.0884
– Wall-clock time: 3h51m

💬 Karpathy’s Words:
“nanochat is a minimal, end-to-end ChatGPT-style stack (~8K LOC) that runs via a single speedrun.sh on one 8×H100 node (~4h ≈ $100). It’s among the most unhinged repos I’ve written.” – Andrej Karpathy

🎉 Get Started!
– Check out the [Technical details and Codes](https://github.com/karpathy/nanochat).
– Follow Karpathy on [Twitter](https://twitter.com/karpathy) and join our [100k+ ML SubReddit](https://www.reddit.com/r/MachineLearning/).
– Subscribe to our Newsletter and join us on [Telegram](https://t.me/MarkTechPost).

Don’t miss out on this incredible opportunity to train your own ChatGPT-style model! 🤖💻🚀