“The morning air is chill and the fog is rolling in, shrouding the valley in a thick blanket of white. The sun is a mere sliver of light on the horizon, casting a weak glow over the landscape. There is a silence in the air, broken only by the sound of the wind rustling through the trees.”
— John Muir describing Yosemite, 1868
Ha, you’ve been fooled! The quote above is not from John Muir, but from asking OpenAI’s GPT-3 AI engine to describe Yosemite on a calm morning. My mind has been blown by the creative capabilities of AI , particularly in the arena of creative storytelling— every piece of generated writing has left me laughing out loud, shocked by a plot twist, or simply moved by the beauty of the language.
My programmer friend Eric and I set out to to share the joys of AI storytelling with more people. Our goal was simple:
- Make people chuckle 🤭
- Excite people about the future of this nascent technology 📈
The result is 🙈 InfiniteMonkeys 🙊, a super simple web app that allows anyone to create short stories with AI.
Here are 3 nuggets I learned from building InfiniteMonkeys:
- AI is now accessible to everyone through easy-to-use APIs. 🙌🏼
- Some of these APIs are slow 🐌 and costly 💰.
- The modern web stack makes building simple web apps trivial ⏩. I’ll do a deep dive into how we built the entire app 🔨.
1. AI is now accessible to all
For the first time, a non-AI/ML trained person such as myself can add “intelligence” to their app with a simple API call. APIs we used:
- Text generation: OpenAI’s GPT-3 Davinci model. No training or data sets required, just a bit of trial and error to tweak prompt formats and parameters. Just ask for what you want, ex. “Write me a story about …” and the model will generate what you want!
- Image generation: StabilityAI’s Stable Diffusion API. Again, no training required. Takes in a text prompt and returns an image. API was a little
- Text to speech: Amazon Polly has a simple API that takes in text and returns a URL to stream the generated audio. The voice quality and modulations have become much more natural and realistic than I expected.
2. Some of these APIs are slow and costly
Specifically, GPT-3 and Stable Diffusion have quite high latency, which is completely understandable given how powerful they are, but still a unique challenge to build smooth UX around. For example, you’ll see that InfiniteMonkeys takes ~15 seconds to generate a story. Both GPT-3 and Stable Diffusion take >10 seconds to execute. This means slow loading times for users. I may be missing some performance optimizations, but this was my out of the box experience.
Amazon Polly, on the other hand, is an incredibly performant API that returns in under a second. 👏🏼 to their engineering team.
These APIs, while 100x cheaper than hiring a writer, artist, or voiceover actor to do the equivalent task, are still expensive enough to be cost prohibitive in some use cases.
- GPT-3: $0.03 per story on average
- Stable Diffusion: $0.02 per image on average. This model is open-source, but we use a hosted API for convenience.
- Amazon Polly: $0.02 for highest quality model. $0.005 for standard model.
Total price per story: ~$0.06
While $0.06 doesn’t seem like much, it quickly adds up especially in the context of a free hobby site like the one we built. We’ll see how long we can keep the site up for before we exhaust our budget.
3. The modern web stack makes building simple web apps trivial
Equally as surprising as GPT-3 writing better than I can was that it only took 10 hours to build and deploy the MVP of InfiniteMonkeys. For context, Eric is a web developer, whereas I am an iOS developer who spent a week doing React tutorials in preparation for this project. We both have 5 years of industry experience, so YMMV.
Our web stack
- Frontend: Next.js is a framework built on React that makes it super easy to build and deploy web apps. It even allows you to write server-less backend code in the same project. If you hook up your GitHub repo to Vercel (company that created Next.js), every commit will automatically be deployed to a public URL within minutes!
- Backend: We used Firebase to store stories and images. It takes 5 minutes to integrate into your project, has an elegant API, and an easy to use GUI console to view and modify your data live.
Time breakdown for MVP (10 hours total)
It’s never been easier to turn ideas into reality using the modern web stack. It’s a fun time to be a tinkerer!
Note: I tried to get GPT-3 to write this blog post for me, but unfortunately it’s not sophisticated enough. Maybe GPT-4 will be able to do the job 😄.
Here’s one last link to InfiniteMonkeys. We hope it makes you chuckle 🤭. Please give us feedback! — Matt and Eric