Let's Talk About the AI Hype Train
🚨 This AI just DESTROYED OpenAI! ⚡️ New model a GAME-CHANGER! Sound familiar? Let's talk about AI hype and how to keep yourself sane.
ChatGPT hit the scene two years ago, and now AI is a trillion-dollar industry. Naturally, the rush is on to create the next best thing. Entrepreneurs, investors, and nearly every company worldwide are chasing their piece of the market. And who can blame them? Even slicing yourself 0.01% of that pie is a cool $100 million. But there are distractions, which I want to talk about today.
Build it and they will come.
We all had that Monday morning. You wake up drowsy, spill the coffee, burn the toast, miss the morning meeting, and send a regretful email. A few years back, I created a powerful product for technical book authors, but it failed due to its complexity. While I take responsibility, major companies restricted access to essential tools, forcing me into complex workarounds. The barriers to entry were so high that every morning felt like that Monday morning.
Public companies and the open source community are building groundbreaking AI products at a speed no one has seen before. All this tech was unavailable to the average person. Now, all one needs is a computer, some open-source software, a product concept, and a $20/month account to build a business. Everything is changing.
Even if you are not a developer, coding tools are becoming savvy enough for anyone to code with some work and know-how. Those gatekeepers that held me back from building a capable product now offer everything I need in a pay-per-use cloud model for pennies on the dollar.
My point is, the hype around AI is real. But there is hype and then there is hyperbole.
The FOMO is real
Every day there are HUGE announcements about the NEXT BIG THING and it can be overwhelming to keep track. I used to watch at least 5-6 YouTube videos daily, and I get about 20 new news items to read every morning.
Whether you use AI to write better, automate a few things, build a niche product, or create the next great product to meet [insert any market here], you will always wonder if you are making the right decision. Social media and news cycles are here to guarantee you question everything every day.
The stories grab your attention.
Any day of the week, you will see announcements full of hyperbole and claims:
Reasoning 10x better than ChatGPT!
Writes better than Claude!
STOP USING GEMINI NOW!
Copilot is SUNK due to Deepseek!
You get the idea. Frankly, this is all clickbait. While amazing products are released daily, and I use many of them, these claims of “better,” “best,” and “throw away everything” are just stories to get you worried and throw you off your goals.
But what about AI benchmarks?
All these big numbers you see people touting are in tables similar to the following popular leaderboard on the AI site Hugging Face:
Most of these new models are spec-driven. Do you know what they mean and their importance?
80% IFEval: The AI is great at following instructions, but can it follow your request without hallucinating?
BBH: 61.87%: It might outthink you in logic puzzles, but will it still mess up your to-do list?
MATH: 40.33%: What math? And if you are writing emails, who cares?
Want to know what these scores mean for real? Let me know in the comments and I might write an article about them.
Mind you, I believe these specs have value and would never suggest you ignore them. However, those breaking news stories sometimes remind me of purchasing a new phone.
Phone manufacturer:
More megapixels!
Fastest chip!
Ultimate graphics!
You:
It is easy to use.
All my friends use it.
It has my apps.
In other words, the ads pull you in with specs, but unless you are a “pro” user, it comes down to this simple question: “Does it work for me?”
Let’s test the hype machines!
Pretend you write children’s stories. You like cute little love stories based on science, curiosity, and nature. Your next concept is about a butterfly on earth falling in love with the brightest star in Alpha Centauri.
You already use ChatGPT 4o, but everyone says “use reasoning models,” “Claude is better for writing,” “DeepSeek kills them all,” and “Google leads the pack in reasoning.”
Confused, you take your time (and maybe even pay) to test the models using the same prompt.
Let’s see what happens:
Prompt: I want to tell a children’s story about a butterfly that falls in love with the brightest star from Alpha Centauri. Research how they might communicate and the unique ways they brought them together. Base as much as you can on known science. Tell the story in two sentences.
First, you run the “insane” ChatGPT o1 reasoning model to research the topic and output the story. It “thinks” and then outputs the following response:
ChatGPT o1 response:
Each night, a tiny butterfly sent delicate pulses of ultraviolet light toward Alpha Centauri—our brightest neighboring star nearly 4.3 light-years away—trusting its fluttering wings could carry love across the cosmic gap. Finally, after eight years had passed for the signals to travel there and back, the star twinkled in reply, and the butterfly knew that even across light-years, love could shine through.
Next, you run Deepseek, the latest “ChatGPT Killer.” It “thinks” for a bit and then responds with the following:
Deepseek response:
In a faraway galaxy, there lived a tiny butterfly named Lila who adored the bright light of Procyon, the star closest to our solar system. One night, as she gazed upon it through her leaves, Lila felt her heart flutter—she imagined the star was responding with a gentle nudge from across the cosmos. Though vast and distant, their connection grew through the whispers of electromagnetic waves, much like how we send messages through the stars'的语言.
Next, you run Google’s Gemini Advanced 2.0 Fast Thinking model with “mind blowing reasoning.” It “thinks” and you receive the following output:
Google Gemini 2.0 Fast Thinking response:
Lila, a butterfly with wings like starlight, felt a whisper from Alpha Centauri A in its ancient light reaching Earth. Bathed in starlight each night, Lila imagined dancing with her distant star; this shared starlight was their silent language of love, bringing their different worlds together.
Finally, you use Claude 3.7 Sonnet with thinking mode that “CRUSHED Every AI Model in the World” and receive the following output:
Claude 3.7 Sonnet with thinking mode response:
A lonely butterfly named Azul, with wings that shimmered like mirrors in the moonlight, discovered she could reflect the light from Alpha Centauri's brightest star (Alpha Centauri A) in precise patterns, creating a rudimentary form of optical communication across the vast 4.37 light-year distance. Years later, when scientists detected unusual light reflections from Earth and sent a probe to investigate, they were astonished to find Azul's descendants still performing the same beautiful light-reflecting dance that had, against all astronomical odds, sparked an interstellar connection between two vastly different entities drawn together by the universal language of light.
Okay, I know this is a two-sentence request for a children’s story, and it may not be a real example of how people use the latest models, but what do you think? Do you feel like one of these models is better or worse? Do you feel that one is THE FUTURE and THE OTHERS ARE DEAD TO YOU?
Without judging the responses, I feel it is safe to say each provides a good starting point for a story, and none offer me groundbreaking new ideas. While some people will tell you they use reasoning for deep analytical research for their business, I imagine creatives would also want to use it for their work. What stood out to me was the same-ness of it all.
Here are some other things I found from the models that did not shout THE BEST to me:
One model selected Procyon as the star, but, to my knowledge, it is not in the Alpha Centauri star system.
Two models gave the butterfly the same name: Lila.
One pushed the definition of “two sentences” to its limits.
The prompt was for a love story, but only two used the word “love.” Maybe love is implied? That is for you to decide.
And your point is?
We will see new AI models daily that push boundaries, leap benchmark numbers, and tout themselves as winners. Social media will rush to announce the death of one AI and the birth of a new savior AI.
One day, and I am sure it will be soon, there will be the one or two that rise to the top, but I recommend you avoid getting lost in the sea of announcements. Here is what I recommend:
Keep on top of AI news, but avoid the hype.
When you see an announcement for something that states it has value that can genuinely help you with your AI use case, try it for an hour or two. If you can’t get it to work or don’t like the results, move on.
If a considerable product release touts numbers and values over improved output, look at it skeptically.
Embrace change, know AI is moving quickly, but if you have a project or favorite product, don’t think you need to drop everything and learn something new. Stay on top of the news cycle, read why the products may be necessary for your use case, and decide if they are worth your time.
Just like your phone purchase, if your friends use it, maybe you should consider it, too.