Published on

OpenAI is a UI/UX company, not an AGI company

Authors
  • avatar
    Name
    Sam Kececi
    Twitter

OpenAI is a UI/UX company, not an AGI company. Mr. Altman will disagree, but he shouldn't view this as a criticism.

Before we dive in, I must highlight something: Deepseek dropped their groundbreaking model on January 20.

All of the groundbreaking work was captured in that initial announcement. The benchmarks on par with o1. The fact that it was Chinese instead of American. The claim that it was trained using <$10 million dollars worth of compute. So why did LinkedIn spiral into a tizzy and NVDA fall 15% one week later?

Because the consumer app hit number one on the app store.


As more and more participants enter the LLM game and show that they can compete on quality and cost, the path towards owning market share becomes less and less about model quality and more about the means through which you interact with any given model. The UX.

Consider the a car. All cars use the same gasoline1. When you buy a car, imagine if the salesman told you: "Our car is the best! It allows you to drive with only Exxon Mobil gasoline, which is demonstrated on benchmarks to be the highest quality."

You'd either laugh or think, "So what...?"

Why would you care about anything other than the car itself? The UX, vibes, and "personality" of the car. You'd be right to apply the same rationale for LLM models.

If language models are all roughly the same on how well they can answer Math Olympiad questions and write code, why would you care about anything other than the vibes and "personality" of the app?


The most viral posts about Deepseek weren't about the benchmarks or training methods. That stuff is cool, and it did get attention. But if you gather the most viral Tweets about Deepseek, they fall into two categories:

  1. People asking via the app about Tiananmen Square or Xi Jinping and watching it self-censor.
  2. People making memes and comparisons about China vs. USA, OpenAI, and Sam Altman.

In other words, the vibes.


Get used to a world where a new "best" model will appear overnight and dethrone the previous frontruner on some combo of cost, speed, and quality. In this world, what matters long term is building a way of interacting agnostically with AI in a novel way.

For the car industry, value is created across two diverging domains: the oil refinery business and the car manufacture business. If you believe competition is for losers, competing on refining gasoline doesn't leave much room for innovation.

Likewise, in the LLM world, value is created across the domains of AI training and inference, and consumer interaction with the AI.

To be clear, OpenAI plays in both. There is ChatGPT and the API. But the marginal value they create is because of the ChatGPT web app and any future consumer-facing product they ship. If the OpenAI API disappeared overnight, everyone building with LLMs would just change their API keys out for Anthropic and call it a day.

The broader point to be made here is that this is the whole point of AI - to make cool consumer shit. Having barrels full of oil sitting in a strategic reserve doesn't create any economic value. The 12 year olds who spend 3 hours a day on CharacterAI don't care which model powers their chat2, they care if the UI looks good and the prompt makes it sound accurate to Gojo Satoru (who currently sits at 782.4 million chats created).

And don't rag on the 12 year olds. Your average Microsoft Teams + CoPilot enjoyer just wants their model to sound like the smart and confident lawyer/banker/salesman they hope to become.

Footnotes

  1. The electric car analogy fits nicely - massive differences like image gen vs. text gen will certainly cater to a different set of buyers. A gas powered Tesla would have a totally different customer base, even if the rest of the car was the exact same.

  2. For companies like CharacterAI, super minor tweaks in LLM quality can have huge impacts on retention and usage. But, remember that this is not the raw LLM coming out of the tap from OpenAI/Anthropic. Companies can and should invest in making models that suit their specific needs and score highly on their specific benchmarks. Perhaps the most accurate claim to make is that performance on "general" benchmarks like AIME or MATH-500 isn't illustrative of consumer value. Unless the job to be done is literally taking Math Olympiad tests or solving word problems. Coding holds the closest match between benchmarks and real-world. It isn't a coincidence that this is where current models thrive.

As with anything I write - please let me know your unfiltered thoughts. Email me: sam.kececi+blog@gmail.com