Someone Else Further Down the Stack

June 4, 2024

I mentioned this when I first saw it, but I just cannot get over this ridiculous interview with the CEO of Zoom. Somehow, it manages to simultaneously highlight pretty much all of the major things that are bothering me as I careen into my grumpiest, middle-ag-iest years. It features an incredibly successful (and presumably rich) entrepreneur who sounds completely insane, and utterly high on his own supply. It’s about a company that is extraordinarily pervasive in our professional life — Zoom, dammit! — and yet seems obviously, visibly disinterested in the actual thing that causes us to use them. And finally, it’s got my personal favorite: an ambitious, potentially impossible value proposition for the future that makes absolutely no sense and logically collapses with even the most cursory investigation.

I’ve been working at startups for a long time, and many, many times me and/or my colleagues have made the “giant bong rip” hand motion as we’ve listened to someone visionary explain what they’d like to be doing instead of what we actually do. This is not new to me. But at the same time, it’s also why I’m so hard on my career — I often convince myself that the companies I’ve worked for are less sophisticated or impressive than you might think because they’ve said completely insane things like this, and presumably real companies with real interest in what they’re doing and a firm grasp of what it is would not say those things.

But… here we are. Zoom Communications has a market cap of eighteen and a half billion dollars, and the guy running it unveiled one of the craziest, impractical, and most nonsensical product visions I’ve ever heard to a leading publication without even really being asked about it. He just… said it out loud for no real reason. And it’s fine! The stock is fine. Presumably Zoom still works, maintained and operated by competent subordinates of this man with totally crazy, horrible, unfeasible ideas about what meetings should be like and what would happen if a bunch of large language models added themselves to them and babbled uncontrollably at each other.

Still, that’s not what bothers me the most, or at least what bothers me in a fundamental way that reminds me of so many other things that bother me. For that, you have to go to this section.

NILAY PATEL, The Verge: When you say the hallucination problem will be solved, I am thinking about this literally in terms of Moore’s law because it’s the closest parallel that I can think of to how other CEOs talk to me about AI. I don’t think you spend a lot of time thinking about transistor density on chips. I’m guessing you don’t just assume that Intel and Nvidia and TSMC and all the rest will figure out how to increase transistor density on chips, and Moore’s law will come along, and the chips will be more powerful and you can build more applications. Just correct me if I’m wrong. I’m guessing you—

ERIC YUAN, Zoom: No, you’re so right. Absolutely. This is a technology stack. You have to count on so many others.

NP: So is the AI model hallucination problem down there in the stack, or are you investing in making sure that the rate of hallucinations goes down?

EY: I think solving the AI hallucination problem — I think that’ll be fixed.

NP: But I guess my question is by who? Is it by you, or is it somewhere down the stack?

EY: It’s someone down the stack. 

NP: Okay.

EY: I think either from the chip level or from the LLM itself. However, we are more at the application level: how to make sure to level the AI to improve the application experience, create some innovative feature set, and also, at the same time, how to make sure to support customized personalized LLM as well. That is our work.

I mean… pardon my French, but… holy shit. This guy is talking about re-centering a successful, eighteen billion dollar video conferencing company entirely around a technology that (a) doesn’t work reliably, (b) may never work reliably, and (c) is completely outside of his control. It’s like Ford announcing that all their cars are going to run on peanut oil, and just saying “eh, I’m pretty sure Exxon can figure it out. Or maybe Penzoil, I dunno.”

LLMs having no understanding of what they’re spitting out, but using pattern recognition to make it kinda-sorta hard to figure out exactly when they are wrong — that’s the whole game right now. That is everything when it comes to making this technology, and all the things people are extrapolating from it, worth literally anything at all. Yuan is excited by the “AI” summaries of meetings. Great! Neat! That is an extremely niche feature with, potentially, some real value, which is why he’s not focusing on it but instead using it as an example of how close he is to allowing people to building virtual, digital doubles who speak, argue, and make decisions on your behalf. The second thing is a much, much bigger thing — a world changing thing, honestly — but it should also go without saying that it is infinitely more complicated and difficult to execute since a computer connected to actual, operational actions simply cannot be wrong the way a computer spitting out text for some bored office drone to skim can be. It’s questionable whether present, best-in-class levels of LLM accuracy are acceptable for even the corporate-virtue-signaling email writing tasks they’re starting to be pointed at now. It is not questionable at all that those levels of accuracy, or even significantly improved ones, are totally unacceptable for authorizing a purchase, or even scheduling a follow-up meeting.

I’m sure it’s a lot of fun to envision interesting product scenarios that sit on top of a magical, accurate, Jarvis-from-Iron-Man-like artificial intelligence apparatus with a reasonably priced API sticking out the back. But it’s not “product management”, or business strategy, or really anything more sophisticated than a couple of stoned college guys try to process the ending of “Her” in 2013. The hard part is building Jarvis, not fantasizing about all the cool things Jarvis could do if he existed. We had all kinds of fun concept fantasies and Dick Tracy watch scenarios floating around in our heads going back to the 1970s (or earlier!), but none of them really meant anything until Apple built us an iPhone to run those applications. And honestly, this feels even more egregious, like if somebody saw a rotary phone and started talking about his idea for Pokemon Go.

The Hologram Isn’t The Hard Part, or the Important Part

The largely forgettable movie “I Robot” was randomly on cable the other night, and I watched the beginning for a while, occasionally looking over at my vacuum and giggling. There’s a scene where the dead guy Will Smith is investigating has left behind this sort of interactive hologram, and Will Smith asks it all these questions. He gets some answers, but every once in a while it says something to the effect of “I’m sorry, I can only answer questions that I’ve been programmed to answer”. Which… makes total sense! This thing is important, and was left behind to help a police officer solve a murder, for God’s sake. I love imagining an LLM-powered version of this, answering correctly 92% of the time, but never admitting it didn’t know an answer and just making plausible sounding stuff up the other 8% of the time. “Sure, I was murdered. It was a band of criminals, seeking my money. Robbery gone wrong, you know? How else can I help you?”

This is all a long-winded, oft-made argument that whatever we decide to call AI — giant if-then statements attached to holograms, probability models, whatever — is not going to fulfill the dreams of the Silicon Valley elite unless either (a) it’s basically right all the time, or (b) it has an excellent grasp of whether it’s right about something and it’s at least right about that basically all the time. Again, this is the ball game right now. If you want to launder stock photos and generate spam sales outreach, maybe do some code suggestion, you’re good. Grab your API key and see what the market will bear for your probability driven idea. But if you think we’re going to automate like… actual people? Who make decisions? Who are held accountable for those decisions? First of all, you gotta figure out if LLMs can even do this, and it’s not looking great right now. And then, if not, you’ve got to invent something else.

Nope! That’s “someone down the stack”. Maybe it’s the chip! Maybe the chip will make large language models turn into… something else? Something better, and also fundamentally different. Chips do that all the time, right? Or maybe we’ll get a transparent, visible decision making engine from an LLM provider, even though no language model works like that because they literally add randomness to make interactions feel more interesting and human-like at the expense of predictability and exactitude.

Look, I’ve clearly had it with this LLM nonsense, and that’s in no small part because it stunk to high heaven from day one, if I’m being honest. And it might feel like I’m an ideologue on this, but I’m really not —  I’ve done the reasonable thing and admitted where the utility may lie. I’ve even built test applications with this stuff. I have an OpenAI developer account, for crying out loud.

But this is some of the worst, most red-flag waving magical thinking I’ve seen from tech in my entire life, and we just sat through crypto! Apple is frantically trying to get away from having to buy freaking cellular modems from Qualcomm because that apparently limits their ability to implement their medium-term vision for iPhones, a product that exists and is real and has literally changed the world. Meanwhile, Zoom and the rest of these guys are just standing around, taking their huge, imaginary bong rips (or real ones, I don’t know, I’m not there) and waiting — just waiting! — for some band of machine learning elves to make it all possible. And when they do, Zoom will be there, leveraging this fantasy technology for the dumbest, most impractical purposes you can think of.

Disclosure: I have a pretty broad 401k managed by someone else so, God help me, I probably own some Zoom stock.