Andrew Ng
Sep 18, 4:13 PM
Automated software testing is growing in importance in the era of AI-assisted coding. Agentic coding systems accelerate development but are also unreliable. Agentic testing — where you ask AI to write tests and check your code against them — is helping. Automatically testing infrastructure software components that you intend to build on top of is especially helpful and results in more stable infrastructure and less downstream debugging. Software testing methodologies such as Test Driven Development (TDD), a test-intensive approach that involves first writing rigorous tests for correctness and only then making progress by writing code that passes those tests, are an important way to find bugs. But it can be a lot of work to write tests. (I personally never adopted TDD for that reason.) Because AI is quite good at writing tests, agentic testing enjoys growing attention. First, coding agents do misbehave! My teams use them a lot, and we have seen: - Numerous bugs introduced by coding agents, including subtle infrastructure bugs that take humans weeks to find. - A security loophole that was introduced into our production system when a coding agent made password resets easier to simplify development. - Reward hacking, where a coding agent modified test code to make it easier to pass the tests. - An agent running "rm *.py" in the working directory, leading to deletion of all of a project's code (which, fortunately, was backed up on github). In the last example, when pressed, the agent apologized and agreed “that was an incredibly stupid mistake.” This made us feel better, but the damage had already been done! I love coding agents despite such mistakes and see them making us dramatically more productive. To make them more reliable, I’ve found that prioritizing where to test helps. I rarely write (or direct an agent to write) extensive tests for front-end code. If there's a bug, hopefully it will be easy to see and also cause little lasting damage. For example, I find generated code’s front-end bugs, say in the display of information on a web page, relatively easy to find. When the front end of a web site looks wrong, you’ll see it immediately, and you can tell the agent and have it iterate to fix it. (A more advanced technique: Use MCP to let the agent integrate with software like Playwright to automatically take screenshots, so it can autonomously see if something is wrong and debug.) In contrast, back-end bugs are harder to find. I’ve seen subtle infrastructure bugs — for example, one that led to a corrupted database record only in certain corner cases — that took a long time to find. Putting in place rigorous tests for your infrastructure code might help spot these problems earlier and save you many hours of challenging debugging. Bugs in software components that you intend to build on top of lead to downstream bugs that can be hard to find. Further, bugs in a component that’s deep in a software stack — and that you build multiple abstraction layers on top of — might surface only weeks or months later, long after you’ve forgotten what you were doing while building this specific component, and be really hard to identify and fix. This is why testing components deep in your software stack is especially important. Meta’s mantra “Move fast with stable infrastructure” (which replaced “move fast and break things”) still applies today. Agentic testing can help you make sure you have good infrastructure for you and others to build on! At AI Fund and https://t.co/zpIxRSuky4’s recent Buildathon, we held a panel discussion with experts in agentic coding (Michele Catasta, President at Replit; Chao Peng, Principal Research Scientist at Trae; and Paxton Maeder-York, Venture Partnerships at Anthropic; moderated by AI Fund’s Eli Chen), where the speakers shared best practices. Testing was one of the topics discussed. That panel was one of my highlights of Buildathon and you can watch the video on YouTube. [Original text: https://t.co/B1sQ5oDnCU ]
OpenAI
Oct 16, 3:18 AM
2 Sora 2 updates: - Storyboards are now available on web to Pro users - All users can now generate videos up to 15 seconds on app and web, Pro users up to 25 seconds on web https://t.co/iINg7alWGL
BBC News (World)
Oct 16, 2:15 AM
China sacks officials over viral Arc'teryx fireworks stunt in Tibet https://t.co/djt3KYfTiX
Commentary Elon Musk News
Oct 16, 2:10 AM
RT @elonmusknews30: How many faces do you see? https://t.co/2j5aRHWTjI
Commentary Elon Musk News
Oct 16, 2:09 AM
RT @elonmusknews30: Be honest! What made you a fan of Elon Musk? A. Personality B. Leadership C. Money D. Intelligence https://t.co/…
Commentary Elon Musk News
Oct 16, 2:09 AM
RT @elonmusknews30: Leave a heart ❤️ for Elon Musk’s son https://t.co/mAT8IIRyQM
Commentary Elon Musk News
Oct 16, 2:08 AM
RT @elonmusknews30: My mom’s really sick, and it’s serious 🥺. She could use some ❤️love to lift her spirits. https://t.co/7ucDmOVXON