Can a Seasoned Software Engineer and AI Write a Fully-Featured, Mature App in a Month?
Exactly one month ago, on June 27, I decided to try an experiment – could I rewrite my long term hobby app with AI? My app Hamster Soup is an app for superfans of the TV show Big Brother, which was set to begin around July 17. That gave me as much as three weeks to get something going, even if it wasn’t a complete release or very robust, in time for the show’s premiere. Initially, I thought I might get 70-80 percent of the way there, and gave myself until July 15 to ship the app to the App Store.
Oh, how wrong I was!
As it turned out, on July 8 – barely a week and a half – I had my app “shippable”, by my self-imposed standards. That gave me plenty of time to add a feature or two before my self-imposed July 15 date. Again, wrong. On July 11 I had added more new features than I thought, and on July 12 I released Hamster Soup 2.0.0 (which really is about the 12th new version of this app over the last 15 years).
That was two weeks ago, and since then I have shipped 10 versions, nearly all of which added one or more significant features. One was mainly a bug fix for a timezone issue. That’s shipping a significant update nearly every single day since launch. Even with a full-time job, I’ve managed to keep up a routine where I think of an idea and build 75% or more of it in the morning before starting my job, mess around with it a little in the evening, get it across the finish line, submit it before going to bed, and wake up to it approved and released the next morning.
It has been a remarkable two weeks working this way, as a kind of super Software Engineer/Product Director mind-melding with ChatGPT. Since I’ve been using the same conversations all this time, the Star Trek reference makes a lot of sense as ChatGPT and I are so dialed in we know how each other thinks and rarely have to explain what we mean to each other.
This is especially apparent when I drop a screenshot into the chat without an explanation and it just knows what I mean or want to show it. That was an experiment the first time I tried it (“I wonder if it can figure out what I want it to address”), but now it’s just the way I work with the tool. I drop screenshots from the app, compiler errors in Xcode, deprecation warnings, and more. It just knows what I want and a second later starts addressing it.
Not only have I used AI to co-develop this app, about a week ago I introduced the first simple AI feature to the product, and have since layered more and more functionality on top of it. Now there is a basic, free AI-enabled experience and an Enhanced Hamster AI in app purchase, that enables user customization, deeper functionality, and a smart summary feature that provides a unique perspective to every user, fine-tuned to their preferences. And I don’t have to write a word of it myself. (Except for the Gen AI prompts, more on that in a moment.) The Enhanced Hamster AI feature is a platform for me to build upon without giving away too much AI for free.
The AI features aren’t free. Because iOS 18 and built-in models aren't available until the fall, I’m using OpenAI to provide AI-based features. Originally it was based on gpt-3.5-turbo, the cheapest model on launch, but shortly afterwards gpt-4o-mini was released as a drop-in replacement. Switching to gpt-4o-mini halved my costs instantly, while providing better performance and quality.
Speaking of working with OpenAI, I knew that putting the AI api code directly in the app was a non-starter. Not only does that make it difficult to update when you need to change something because of the App Review process, it is inviting significant problems to have your api key embedded in an app on someone else’s device. I knew the right way to deal with it, but asked ChatGPT for its opinion. It gave me a few different options, and in about 5 minutes not only had we picked the best implementation as a cloud function, but we’d already written a working prototype that only needed minor tweaks to turn into the initial production implementation.
As a cloud function, not only are you able to quickly deploy changes, but you can also lock it down to minimize the amount of mischief a bad actor can do on your dime. You have precise control over your costs, and can adjust in real-time, unlike a mobile app. Over time, we’ve iterated and improved the implementation, added additional functions for analytics and other useful features, and added algorithms to maximize the variety and customizability while minimizing the costs.
For example, I noticed that most of the time, Hamster AI referred to the happenings in the Big Brother house as “episodes”. Which makes sense, since it’s based on a TV show. But these are the live feeds, which is just houseguest time passing by in front of dozens of cameras. After reading so many “On the latest episode, …” updates, I tweaked the various prompts (there are several, for more variety) to avoid that kind of phrasing.
For another, I noticed as a tense argument broke out the first Saturday morning after nominations, the live feeders documenting it went into a lot more detail in their descriptions, blowing past the token budget I was imposing on my functions. It took about 30 minutes to perfect an algorithm that balances input to output to both cover what was happening, and provide an effective summary to the user who had only a few seconds at a time to stay up to date. We fixed that issue in realtime during the argument and its aftermath.
One funny incident occurred as I was developing a new Enhanced AI customization feature, letting the user dial in Hamster AI based on preferences on tone, emotion, content focus, and conciseness. I found that if you set a content focus on Drama instead of balanced, Gen AI did what it does, and spun out a wild tale of intrigue, suspicion, and backstabbing – which was completely made up and inferred from the most minor detail. Big Brother will block the feeds occasionally for production reasons, and historically they would play theme music and show the front of the house, the fishtank, hamsters, and nowadays they show live feeds from local animal shelters to encourage people to adopt pets. Hamster AI interpreted the mention of animals as if they were unleashed in the house, causing chaos and arguments as houseguests picked sides and fought each other for control. Gen AI, you so crazy! Fixed with some light prompt work.
A month in, I have a brand new, but pretty mature app, and a platform to continue innovating upon. But also I have a lot of experience co-developing a project with Artificial Intelligence. And it’s given me a lot of ideas for my next project. More on that in a future post.
For my personal enjoyment, the best things about this are:
- My technical expertise remains highly relevant and required: I’m constantly checking ChatGPT’s work, modifying it, and making precise suggestions for improvements.
- I get to concentrate mainly on creation and idea-generation, and leave the bulk of the basic coding work to ChatGPT.
- I work with a virtual partner that knows what I’m thinking without overexplaining most of the time, and infinite patience when I give more and more detailed requirements. It writes the tenth and final solution just as dutifully as the first attempt.
- I finally have a proper iPad app.
- I have written an app that will run “forever”, with almost no direct involvement from me in generating its content. I set a few URLs in its CloudKit database, and that’s it. Well, I still have to pay for the bill for the the annual Apple Developer fee, cloud functions, and OpenAI api usage. (Until iOS 18 and beyond, when we can do all of this on device.)
- Not having to be responsible for fresh content regularly has rebooted my fandom for Big Brother. I keep the live feeds on in the background now, just like the old days. Expect the unexpected! 😂
The Numbers, T-plus 30 days
- 1 Month of development
- Time to initial 2.0.0 release - 14 days
- Time to 2.0.9 Release (10th release, 9th feature release) - 14 days
- 6,586 Lines of (apparently) bug-free Swift code
- 112 Detailed git commits
- 868 Lines of unit tests (but definitely not 100% coverage)
- 281 Lines of Javascript code (cloud functions)
- 3 ChatGPT conversations (2 for app development, 1 for cloud functions)
- 1 Maxed-out ChatGPT conversation (hit the limit of the 128,000 token context)
- 300+ daily users and rising (organic, I haven’t started advertising yet)
- 2,225,696 OpenAI api tokens used and counting