GenAI improvements bring back the promise of truly useful digital assistants

Bob O'Donnell

Posts: 89   +1
Staff member
Forward-looking: Remember when we thought Siri, Alexa, and Google Assistant were going to be really helpful? Yeah, me too. Fast forward about ten years to today, and we're starting to see some much more impressive demos of just how far digital assistants have progressed. The possibilities look both compelling and intriguing.

On Monday, OpenAI took the wraps off its new GPT-4o model and the accompanying update to ChatGPT that makes it possible to not only speak with ChatGPT but do so in some eerily realistic ways. The new model lets you interrupt it for a somewhat more natural conversation flow and responds with more personality and emotion than we've heard from other digital assistants.

With the updated ChatGPT apps for iOS and Android, it can also see and understand more things via a smartphone camera. For example, OpenAI demonstrated a homework helper app that could guide students through simple math problems using the camera.

Then on Tuesday, Google unveiled a huge range of updates to its Gemini model at its I/O developer event, including a similar homework helper function within Android itself. Google also demonstrated Gemini-powered AI summaries for Search, more sophisticated applications of Gemini in Google Workspace, and a new text-to-video algorithm called Veo that's akin to OpenAI's recently introduced Sora model.

Demos from both companies leveraged similar technologies that many other companies are clearly developing in parallel. More importantly, they highlighted that some core capabilities needed to create intelligent digital personal assistants are nearly within reach.

First is the increasingly wide support for multi-modal models capable of taking in audio, video, image, and more sophisticated text inputs and then drawing connections between them. These connections made the demos seem magical because they imitated how we as human beings perceive the world around us. To put it simply, they finally demonstrated how our smart devices could actually be "smart."

Another apparent development is the growing sophistication of agents that understand context and environment and reason through actions on our behalf. Google's Project Astra demonstration, in particular, showed how contextual intelligence combined with reasoning, personal/local knowledge, and memory could create an interaction that made the AI assistant feel "real."

Currently, definitions of what an AI-powered agent is and what it can do aren't consistent across the industry, making it tough to generalize their advancements. Nevertheless, the timing and conceptual similarity of what OpenAI and Google demonstrated makes it clear that we're a lot closer to having functional digital assistants than I believe most people realize. Even though the demos aren't perfect, the capabilities they showed and the possibilities they implied suggest we are getting tantalizingly close to having capabilities in our devices that were in the realm of science fiction only a few years ago.

As great as the potential applications may be, however, there remains the problem of convincing people that these kinds of GenAI-powered capabilities are worth using on a regular basis. After the initial hype over ChatGPT began to slow towards the end of last year, there's been more modest adoption of the technology than some people anticipated. What remains to be seen is whether or not these kinds of digital assistant applications can become the trigger that makes large numbers of people willing to start using GenAI-powered features. Equally important is whether or not they can start changing people's lives in the ways that some have predicted generative AI could.

Like it or not, the only way you can get an effective digital assistant is if it can get unfettered access to your files, communications, work habits, contacts (and much more)...

Of course, part of the problem is that – as with any other technology that's designed to customize experiences and information in their own unique way – people have to be willing to let these products and these companies have deeper access into their lives than they ever have if they want to get the full benefit from them. Like it or not, the only way you can get an effective digital assistant is if it can get unfettered access to your files, communications, work habits, contacts, and much more. In an era of growing concern about the impact of tech companies and products, this could be a tough sell.

In the US, much will depend on what capabilities Microsoft and Apple unveil at their developer conferences in the coming weeks. Given the iPhone's dominant share in the US smartphone market, the GenAI-powered capabilities Apple chooses to enable will significantly influence what people consider acceptable and important (whether through its own development or licensed via OpenAI or Google, as the company is rumored to be doing).

Call it Siri's revenge, but any digital assistant or agent technologies that Apple announces for the next version of iOS will have an outsized influence on how many people view these technological advancements in the near term.

Ultimately, the question also boils down to how willing people are to become even more attached to their digital devices and the applications and services they enable. Given the enormous and growing amount of time we already spend with them, this may be a foregone conclusion. However, there is still the question of whether people will perceive some of these digital assistant capabilities as going too far. One thing is certain: this trend will be interesting to watch.

Bob O'Donnell is the founder and chief analyst of TECHnalysis Research, LLC a technology consulting firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on Twitter

Masthead credit: Solen Feyissa

Permalink to story:

 
Good article.

Having used a few LLM locally, mainly the open source ones, I'm still not convinced they are quite ready. Yes, a lot of the time you're "amazed" they'll give you an answer, but worryingly, it may not be the correct answer. And unless you cross reference with another source, you may never know it's shoveling you bullshit.

They still make a lot of mistakes, and will straight out tell you bullshit. And you can call them out on it, then they'll simply say "I apologise" and give you a slightly better answer. It is almost like they know they just making a guess.

Given the sad propensity for humans just to blindly trust anything they read or are presented to them these days / see media or social media, we're in for a ride.

If people start putting absolute trust in these (and many will as they don't know any better), it is going to be a recipe for disaster.

Human walking around the forest/woods and asks assistant: "What mushroom is this".. assistant: "looks like a <whatever> mushroom, tasty to eat". Human picks mushroom, and cooks into food.. Human dies painfully as was actually deadly mushroom. Maybe not the best example, but you see my point.

There will be safety walls, but through the cracks there is going to be plenty of things that slip through that are going to cause a heck of a lot of danger.
 
Last edited:
I don't really like where this tech is going. I want to see it used in such a way that makes it easier for people to use a computer/smartphone. Honestly really like the interactive depiction seen in the older Star Trek television show.... there are times where verbal interaction will be faster (or perhaps just more convenient) than using physical input and receiving data by looking at data on the display. I think this AI is capable of this, but I don't really see anyone trying to make it work in that fashion. It is frustrating because it seems a good way to get us away from being glued to our smartphone screens and scrolling around constantly, when we could simply be talking with it like a person. We could have more freedom to observe and enjoy life and still accomplish the technology things we need and want to do verbally.
 
Back