I continue to experiment with Claude, and like many others have said, the difference between the 4.6 models (particularly Opus) and the previous models doesn’t feel massive, but it just seems so much more competent.
I’ve mostly been building personal helper apps and automations:
- a personal bookmarking/read later solution that works on iOS/iPadOS/macOS
- a scraper that goes to the websites of all the local music venues and pulls down the upcoming concerts so I can quickly look at what is coming up and decide if I’m interested in checking anything out
- some games to entertain my kids that I know aren’t going to lead them into a morass of in-app purchases or down the algorithm rabbit hole
As I’ve built these over the last few months, there was a lot more hand-holding for the prompting. Claude would constantly overestimate its abilities, and break Xcode projects, or create features and tests that it insisted it were working, but neither the feature nor test actually worked. After a while, it’d just tell me I didn’t need that feature, and to move on.
With the 4.6 models (particularly Opus 4.6), it is worlds different. Far more frequently I can say what I want, and it’ll have a 90% working solution by the end of my session. A couple of iterations, and it’s basically done (like a marble race JS/HTML game I made for my kids was nearly perfect from the start).
With Opus 4.6, I pointed it at the work Claude had previously done, and asked for improvements, and it made what I would call legitimate improvements to simplicity, performance, and good idiomatic usage. Good enough that at the end of the process, I even had Claude create MCP servers for the bookmarking and concerts tools I’ve built, so that I can ask desktop Claude about shows at a venue or any articles I’ve bookmarked related to certain topics, and it can bring them back in context (and tie them into the context of other things I might be doing, like adding buying a concert ticket to my todo list, or incorporating articles into some broader research I’m working on).
Life changing? No. Worth $20/month? For me, probably. For everyone? I don’t think so yet. It is still so much more heavily weighted towards having some inkling on things you’d want to build and automate, which is not every computer user.
These are small, toy apps. Useful to me, but nothing I’ve built would I feel comfortable sharing to anyone but friends and family. Incorporating this into existing, commercial apps is going to be harder than I think a lot of folks believe, because as good as Claude (and Codex) have become, they still frequently make meaningfully bad decisions around things like security. For businesses that are in an industry with high compliance needs, you will need good processes to ensure you’re not leaking information that could cause you legal or reputational issues. For folks launching apps on Lovable (or the like), I think the models are so good, that they give too much confidence that they’ve buttoned everything up, but they’ll make mistakes (just like humans do), but without some developer to take a look and try to catch it before it burns you.
