0 C
New York
Monday, March 2, 2026

Buy now

spot_img
Home Blog Page 5

AI’s New Trick: Creating & Vetting Coding Challenges Like Humans!”

Ever wondered if your AI’s coding skills are as good as they seem? A team of researchers from top universities and tech giants like OpenAI and MIT have come together to create AutoCode, an AI framework that lets Large Language Models (LLMs) create and verify competitive programming problems, just like human problem setters!

🤖 Why Problem Setting Matters
Current code benchmarks often rely on under-specified tests that let wrong or shortcut solutions pass, inflating scores and rewarding fragile tactics. AutoCode aims to fix this by minimizing false positives (FPR) and false negatives (FNR).

🔄 The Core Loop: Validator → Generator → Checker
AutoCode runs a closed loop that mirrors human contest workflows, with each step selected from LLM-generated candidates using targeted tests.

1. Validator: The system first asks an LLM to synthesize evaluation inputs and prompts it for validator programs. It selects the best one that classifies valid and near-valid illegal cases accurately, preventing “correct” solutions from crashing on malformed data.

2. Generator: Three strategies produce test cases, including small-data exhaustion, random + extreme cases, and TLE-inducing structures. Invalid cases are filtered by the selected validator, and cases are deduplicated and bucket-balanced before sampling.

3. Checker: The checker compares contestant outputs with the reference solution under complex rules. AutoCode generates checker scenarios and selects the best checker by accuracy against labeled scenarios.

4. Interactor (for interactive problems): AutoCode introduces a mutant-based interactor that makes small logical edits to the reference solution, selecting interactors that accept the true solution but reject the mutants, maximizing discrimination.

🌟 Dual Verification Enables New Problems
AutoCode can generate novel problem variants starting from a random “seed” Codeforces problem. It drafts a new statement and two solutions, accepting a problem only if the reference output matches brute force across the generated test suite. This dual-verification protocol filters out error-prone items, lifting reference-solution correctness from 86% to 94% before human review.

📈 Understanding the Results
On a 7,538-problem benchmark, AutoCode achieved 91.1% consistency with official judgments (FPR 3.7%, FNR 14.1%), outperforming prior generators like CodeContests and TACO. On a separate, more difficult 720 recent Codeforces problems (including interactive tasks), the full framework reported 98.7% consistency, 1.3% FPR, and 1.2% FNR.

🎯 Key Takeaways
– AutoCode builds contest-grade test suites and new problems using a Validator-Generator-Checker (+Interactor) loop with dual verification.
– On held-out problems, AutoCode’s test suites reach ~99% consistency with official judges, surpassing prior generators.
– The mutant-based interactor improves evaluation for interactive problems.
– Human experts rate a sizable fraction of AutoCode-generated items as training-usable and a non-trivial share as contest-quality.

AutoCode is a practical fix for current code benchmarks, centering problem setting and using a closed-loop pipeline to reduce false positives/negatives and yield judge-aligned consistency. Check out the paper and project, and follow the team on Twitter, Reddit, and Telegram for more updates!

🚨 Tech’s Biggest Week Ever! Apple’s M5 Explosion, Windows 10’s Farewell, & More! 🚨

This week was a tech tornado! Apple unleashed a storm of M5-powered devices, Samsung set a date for its XR headset, and Windows 10 bid us adieu. Let’s dive into the seven biggest stories and catch you up to speed.

7. Windows 10’s Final Curtain Call
(Image credit: Anna Kucherova / Shutterstock / Microsoft)
Windows 10’s support officially ended on October 14. No more updates, bug fixes, or security patches. But don’t panic just yet! There are still ways to extend its life, like signing up for free extended updates, turning your laptop into a Chromebook, or trying a stripped-down Windows 11. Read more: [Windows 10 End of Life live: everything you need to know](https://www.example.com/windows-10-eol)

6. Windows 11’s AI Power-Up
(Image credit: Microsoft)
Microsoft rolled out new AI features for Windows 11, making every PC an “AI PC” with Copilot at the center. Say “Hey Copilot” to start issuing voice commands, and get help with tasks using Copilot Vision. Read more: [Microsoft reveals plan to “make every Windows 11 PC an AI PC”](https://www.example.com/windows-11-ai)

5. Pokémon Legends: Z-A – A New Evolution
(Image credit: Future / The Pokémon Company)
Pokémon fans, rejoice! Legends: Z-A is here, and it’s a game-changer. With improved battle mechanics, engaging exploration, and a fantastic story, this might be the best Pokémon game yet. Read more: [Pokémon Legends: Z-A is the Pokémon game I always wanted](https://www.example.com/pokemon-legends-z-a-review)

4. Samsung’s XR Headset Release Date
(Image credit: Lance Ulanoff / Future)
Mark your calendars! Samsung’s consumer XR headset will launch on October 21. Register now to get a $100 credit towards other Samsung gear. Read more: [The Samsung XR headset is almost here](https://www.example.com/samsung-xr-headset)

3. Asus ROG Xbox Ally X Reviewed
(Image credit: Future)
The Asus ROG Xbox Ally X is finally here, and it’s a Windows 11-powered gaming handheld PC. While it’s not a perfect console experience, it’s a solid step in the right direction. Read more: [Asus ROG Xbox Ally X doesn’t fix Windows 11 on handhelds](https://www.example.com/asus-rog-xbox-ally-x-review)

2. Apple’s M5 MacBook Pro Arrives
(Image credit: Apple)
Apple’s 14-inch MacBook Pro got a powerful M5 chip upgrade, making it a safe bet for creatives looking to update. The 16-inch model and M5 Pro/Max chips are still in the works, though. Read more: [The M5 MacBook Pro is official – here are 5 things you need to know](https://www.example.com/m5-macbook-pro)

1. Apple’s M5 iPad Pro Unveiled
(Image credit: Apple)
The new iPad Pro 11 and 13-inch models got the M5 chip treatment, with a focus on AI enhancements. The Neural Engine and neural accelerators will boost generative AI operations, and the prices remain the same. Read more: [Apple unveils an M5-powered iPad Pro and makes the update all about AI](https://www.example.com/m5-ipad-pro)

Revolutionizing AI: New Method Makes Reinforcement Learning Predictable, Saves Massive Compute Time!”

Ever felt like you’re throwing darts in the dark when it comes to fine-tuning large language models with reinforcement learning (RL)? You’re not alone. Unlike pre-training, RL post-training lacked clear rules to predict how more compute time would improve performance. But a groundbreaking study from Meta, UT Austin, UCL, Berkeley, Harvard, and Periodic Labs is changing the game!

The Sigmoidal Curve Secret

Pre-training often follows power laws, but RL fine-tuning targets bounded metrics like pass rates or mean reward. The research team discovered that sigmoidal curves fit these metrics better, especially when extrapolating from smaller runs to larger budgets. The sigmoidal parameters also have intuitive roles: one sets the performance ceiling, another the efficiency, and another the midpoint of fastest gains.

Why It Matters

After around 1-2k GPU-hours, you can now forecast whether pushing to 10k-100k GPU-hours is worth it, before burning that budget. Power-law fits can mislead unless you only fit at very high compute, making early forecasting impossible.

Introducing ScaleRL: The Predictable Recipe

ScaleRL isn’t just a new algorithm; it’s a combination of choices that produced stable, extrapolatable scaling:

– Asynchronous Pipeline RL for off-policy throughput.
– CISPO (truncated importance-sampling REINFORCE) as the RL loss.
– FP32 precision at the logits to avoid numeric mismatch.
– Prompt-level loss averaging and batch-level advantage normalization.
– Forced length interruptions to cap runaway traces.
– Zero-variance filtering and No-Positive-Resampling.

The team validated each component and showed that ScaleRL’s fitted curves reliably extrapolate from 8k to 16k GPU-hours and hold at much larger scales, including a single run extended to 100k GPU-hours.

Results and Generalization

Two key demonstrations prove the predictability at scale:

1. An 8B dense model and a Llama-4 17B×16 MoE (“Scout”) closely followed sigmoid extrapolations from smaller-compute segments.
2. Pass-rate improvements on an iid validation set tracked downstream evaluation, suggesting the compute-performance curve isn’t a dataset artifact.

The research also compared fitted curves for prevalent recipes and reported higher asymptotic performance and better compute efficiency for ScaleRL.

Which Knobs Move the Ceiling vs Efficiency?

The framework helps classify design choices:

– Ceiling movers (asymptote): scaling model size, longer generation lengths, and larger global batch size raise the asymptotic performance but may slow early progress.
– Efficiency shapers: loss aggregation, advantage normalization, data curriculum, and the off-policy pipeline mainly change how fast you approach the ceiling, not the ceiling itself.

Operationally, the team advises fitting curves early and prioritizing interventions that raise the ceiling, then tune the efficiency knobs to reach it faster at fixed compute.

Key Takeaways

– RL post-training progress can be modeled with sigmoidal compute-performance curves, enabling reliable extrapolation.
– ScaleRL, a best-practice recipe, combines PipelineRL-k, CISPO loss, FP32 logits, prompt-level aggregation, advantage normalization, length control, zero-variance filtering, and no-positive-resampling.
– Using these fits, the team predicted and matched extended runs up to 100k GPU-hours on validation curves.
– Some choices move the asymptotic ceiling, while others mainly improve compute efficiency.
– The framework provides early forecasting to decide whether to scale a run, and improvements on the in-distribution validation track downstream metrics, supporting external validity.

This work turns RL post-training from trial-and-error into forecastable engineering, saving precious compute time and resources. Check out the [paper](https://arxiv.org/pdf/2510.13786) for more details, and explore their [GitHub Page](https://github.com/…), [Twitter](https://twitter.com/…), [SubReddit](https://www.reddit.com/r/…), [Newsletter](http://eepurl.com/…), and [Telegram](https://t.me/…) for tutorials, codes, and updates!

🚨 Catch Every Ball Live! How to Watch New Zealand vs England T20 Series 2025 for FREE! 🏏

Mark your calendars, cricket fans! The highly anticipated New Zealand vs England T20 series is just around the corner, kicking off on October 18 and wrapping up on October 23. Here’s everything you need to know to catch all the action, including free streaming options!

Series Schedule:
– 1st T20: October 18, Christchurch
– 2nd T20: October 20, Christchurch
– 3rd T20: October 23, Auckland

Match Timings:
– 7:15 AM BST / 2:15 AM ET / 4:15 PM AEST

Free Streaming in New Zealand:
If you’re in New Zealand, you’re in luck! TVNZ+ is streaming the series live and for free. Just tune in and enjoy the match!

Watching from Abroad? Use a VPN!
If you’re traveling outside of New Zealand, don’t worry! You can use a VPN to access your free stream on TVNZ+. We recommend ExpressVPN, which is user-friendly and offers a 30-day money-back guarantee.

How to Use a VPN:
1. Install your chosen VPN (we recommend ExpressVPN).
2. Select ‘New Zealand’ as your location.
3. Head over to TVNZ+ and enjoy the cricket!

Streaming in the US, UK, India, and Australia:
– US: ESPN Select ($11.99/month, moving to $12.99 on 21 Oct)
– UK: TNT Sports (Add to TV package or £30.99/month on Discovery+)
– India: Sony Sports Network (Sony LIV subscription from ₹399/month)
– Australia: Foxtel or Kayo Sports (First month from $1, then $30/month)

Q&A:
– What’s the series schedule? October 18, 20, and 23.
– What are the match timings? 7:15 AM BST / 2:15 AM ET / 4:15 PM AEST
– Who are the squads? Check out the full squads [here](link-to-squads).

Don’t miss out on the exciting action! Grab your snacks, invite your friends, and get ready to cheer for your favorite team. Happy cricket watching! 🏏🎉

Say Cheese! Google Messages is About to Get a Whole Lot Funner with AI-Made Memes!”

Hold onto your hats, folks! Google Messages is about to get a serious upgrade with the addition of Nano Banana, Google’s AI image generator. According to the tech sleuths over at Android Police, they’ve found code that hints at this exciting new feature.

Here’s the deal: Nano Banana is set to make and edit images right within your chats on the Messages app. How, you ask? Well, there’s a sneaky little banana-shaped icon that appears when you long-press an image in a message thread. It’s currently just chillin’, but it’s expected to spring into action soon.

Now, you might be wondering what exactly Nano Banana can do in Messages. Well, it’s still a bit of a mystery, but the fact that the banana icon pops up when you long-press an image suggests it’ll be more about editing existing images than creating new ones from scratch. Think filters, image clean-ups, or even turning your friend’s awkward selfie into a hilarious meme!

Google’s been rolling out Nano Banana to other apps like NotebookLM and Search through Google Lens, and it’s already announced plans for Google Photos. So, it’s only a matter of time before Messages gets the AI image treatment. And with over five billion images already created with Nano Banana, it’s safe to say this AI is a force to be reckoned with.

But Google’s not the only one in the AI image game. Meta AI is making waves in WhatsApp and Messenger, Apple’s got custom emoji and image generation for iMessage, and even Snapchat’s jumped on the AI image bandwagon. So, Google’s got some stiff competition.

When will Nano Banana hit Messages? Your guess is as good as ours, but in the meantime, you can always play around with the new Nano Banana-enabled camera equipment to tide you over.

Stay tuned for more tech news and updates, folks! And if you’re feeling extra tech-savvy, you can follow TechRadar on Google News, TikTok, and even WhatsApp for all the latest happenings in the world of tech.

Toshiba’s 40TB HDD: A Decade Too Late?”

In a world where data is king, storage wars are heating up. Seagate has already started testing its 40TB HAMR (Heat-Assisted Magnetic Recording) drives, and now Toshiba is finally stepping into the ring, but it might be too little, too late.

Toshiba’s plans involve a 12-platter design using Microwave-Assisted Magnetic Recording (MAMR), aiming for a 2027 launch. But hold your horses, because Seagate has been busy. Back in May 2025, Seagate shipped limited units of its 40TB HDDs, using HAMR to store 4TB per platter across ten platters. Full-scale production is set to begin in the first half of 2026.

Seagate’s CEO, Dr. Dave Mosley, has revealed that they’re not stopping at 40TB. They’re planning 44TB models for 2027 and 50TB drives by 2028. So, while Toshiba is just starting to talk about 40TB, Seagate is already looking beyond.

Toshiba’s new design adds two disks to the ten-disk nearline format and uses glass platters for thinner, more precise, and durable storage. They’re also exploring how this 12-disk configuration could work with HAMR for future products. But by 2027, when Toshiba’s drives hit the market, Seagate might already be well past the 40TB mark, leaving Toshiba playing catch-up.

So, while Toshiba’s 40TB HDD is an impressive feat, it might just be a drop in the bucket in the fast-paced world of data storage. Stay tuned to see who comes out on top in this storage showdown!

💥 Sneak Peek: Anthropic’s Claude Code for Web Lets You Choose Your Coding Environment!

Anthropic’s much-awaited Claude Code is almost here, and a leaked codebase update has given us a sneak peek into what’s coming! Here’s what we’ve discovered:

🔒 Security First: Users can choose from three security profiles for their coding environments:
– Trusted Network Access: Packages are installed from verified sources.
– No Network Access: Maximum isolation for your coding sessions.
– Custom Access: You specify allowed or blocked domains, with wildcard support.

🔑 Secrets and Variables: Each workspace supports environment variables and secrets.

💻 Web UI with a Twist: The web interface splits the screen, dedicating one side to session management and prompts, and the other to chat responses. You can interact directly with Claude during each coding session.

🛠️ Claude’s Capabilities: Claude Code’s AI can use CLI tools, search online, edit files, and even open GitHub pull requests once your task is complete.

🌐 Cloud-Native Coding: The web version mirrors a terminal-like experience but with the benefits of cloud execution, session history, and accessibility from anywhere. Mobile support is on its way too!

🔒 IT Control for Organizations: A dedicated toggle in settings lets organizations disable or enable Claude Code, providing IT control. Plus, a waitlist system will manage capacity during early high demand.

Anthropic is positioning Claude Code as a full-stack, agent-driven solution for developers seeking more autonomy from traditional CLI workflows. If the buzz around this early preview is any indication, Claude Code could give existing solutions like GitHub Copilot or ChatGPT Codex a run for their money, especially for teams valuing privacy controls and flexible environment setup. Stay tuned for the official launch! 🚀

Revolutionize Your Workflow: Claude Now Plays Nice with Microsoft 365!”

🚀 Anthropic just dropped a game-changer! They’ve rolled out a new integration that lets Claude connect directly with Microsoft 365. This means all you enterprise users relying on SharePoint, OneDrive, Outlook, and Teams can now have your AI assistant fetch data from these platforms in a snap! 🔎

The update is live for all Claude Team and Enterprise plan users worldwide. Here’s what you can expect:

– Seamless Search: Claude can now dive into your SharePoint, OneDrive, Outlook, and Teams to pull up tailored responses. No more manual uploads or switching tabs!
– Real-Time Insights: Claude uses the MCP protocol to securely analyze documents, emails, meeting notes, and chat threads. Need to check company policies or project details? Just ask Claude!
– Unified Search: Say goodbye to data silos! This update combines search and analysis across platforms, saving you time and effort.
– Shared Project Space: Once your admin connects the apps and customizes data prompts, everyone in your organization can access the shared project space.

Early buzz from IT admins and enterprise teams is positive. They’re loving how the connector speeds up onboarding and helps locate internal experts quickly. Plus, admins can control which data sources Claude can access, keeping sensitive info safe.

Anthropic, the brains behind Claude, is all about boosting business productivity with AI. This Microsoft 365 connector is just the latest step in their mission to make AI work for you, not the other way around. So, enterprise users, get ready to supercharge your workflow with Claude and Microsoft 365! 💪🚀

🚀 Boost Your Coding Productivity: Meet SWE-grep & SWE-grep-mini, Your New Super-Fast Code Search Sidekicks!

💥 Big news, developers! Cognition and Windsurf have just unleashed SWE-grep and SWE-grep-mini, two power-packed models designed to revolutionize your code search game. These aren’t your average models; they’re built for speed and efficiency, promising to cut your search time by up to 20 times! 🏃‍♂️💨

Here’s the scoop:

🔹 Blazing Fast & Parallel: SWE-grep and SWE-grep-mini can handle up to 8 parallel tool calls per turn, for up to 4 turns. This means they can find what you need in a flash, even in massive codebases.

🔹 Smart & Precise: Unlike traditional methods, these models use reinforcement learning to prioritize precision. They learn from their mistakes, minimizing context pollution and saving your valuable agent tokens.

🔹 Easy to Use: Windsurf users, rejoice! The new Fast Context subagent is rolling out gradually, and you don’t need to do anything special to use it. Just keep using Windsurf Cascade, and the new feature will kick in when you need to search for code.

🔹 Playground & Cross-Platform Support: Anyone can experiment with these models in the dedicated demo playground. Plus, they support multiple platforms, including Windows, so you’re covered no matter where you code.

Cognition and Windsurf, the powerhouses behind these innovative tools, are determined to keep you “in flow” and boost your productivity. They’re not stopping here, either – plans are already in the works to expand SWE-grep’s deployment across more products and future updates. Stay tuned, developers! Your coding future just got a whole lot faster. 🚀💻

🚀 Manus 1.5: Build Web Apps with Real AI in a Flash!

Manus just dropped a game-changer with version 1.5! Here’s what’s new and why you’ll love it:

💥 Blazing Speed & Quality: Tasks run 4x faster, and quality scores have jumped by 15%! Users are 6% more satisfied, according to Manus’ internal tests.

🌟 All-in-One Workflow: Manus 1.5 targets developers who want to research, code, deploy, and analyze all in one smooth workflow. It can scaffold backends, set up user logins, attach databases, and ship AI features. Plus, it can launch a live preview, act like a user to test itself, and even fix issues it finds!

🎨 Visual Editing: Point to UI sections and describe changes – it’s that easy!

📈 Built-in Goodies: Analytics, versioning, permissions, custom domains, and event notifications come standard.

Availability: Manus 1.5-Lite is open to all users today. For the full version, subscribe and enjoy a 50% discount on Lite credit spend.

What’s Changed: A revamped runtime that allocates more power to tough tasks, and an expanded single-task context for long jobs. The new Library centralizes outputs, and Collaboration lets you team up with others in shared sessions.

Manus in a Nutshell: It’s a general agent that turns your thoughts into actions, working across web, mobile, and its hosted builder. With 1.5, it’s all about going “from prompt to web apps” – end-to-end generation with login, database, and backend from a single description.

Source: ManusAI’s official Twitter announcement

Follow by Email
YouTube
WhatsApp