How AI chatbots are transforming the internet
AI chatbots are profoundly impacting how we use information and communicate with each other. Discover what’s happening and how they are transforming the internet here.
Everyone who’s anyone in the technology space wants to be a part of the AI revolution. Industry leaders, like Satya Nadella and Sundar Pichai know artificial intelligence will redefine the internet.
Already, we’ve seen a smattering of advanced LLM-powered bots from these brands, promising conversational aptitude unimaginable as recently as 2020. Open AI got the ball rolling in November 2022 with ChatGPT – the most famous of the bunch. Then, a couple of months later, Microsoft launched its Bing Copilot, while Google released “Bard,” now Gemini, with other players Anthropic and Perplexity joining the fray soon after.
At first, Open AI’s technical lead was enormous. Nothing else got close to the performance of ChatGPT. However, things changed during 2023 as the other developers whittled down the gap, eventually getting closer to parity.
That’s not to say that these chatbots are the same, though. As we will see, there are still differences between them that could affect which bot you choose.
ChatGPT Vs. The Rest
ChatGPT-4o is the latest iteration of the bot released in May 2024. The tool features what industry insiders call full “multimodality,” allowing it to interpret text, audio, and visuals across multiple languages and environments. However, it has competitors.
Google’s Gemini is a significant rival. Released in December 2023, it improved considerably over the LaMDA and PaLM 2-powered Bard. Microsoft’s Copilot also offers stiff competition (although it relies heavily on OpenAI’s tech).
Figuring out which chatbot is best on your own isn’t easy. But fortunately, various widely-accepted benchmarks exist that are useful for making comparisons.
Multimodal Language Understanding (MMLU) is one of the top benchmarks. It measures LLM’s ability to interpret meaning across text, audio, and image modalities as a percentage score out of 100. GPT-4o scored 88.7% on this test, beating Anthropic’s Claude3 Opus at 86.8% and Gemini Pro 1.5 at 81.9%. This suggests that OpenAI’s product may be best at understanding and interpreting other languages.
General Purpose Question Answering (GPQA) is another technique. It involves asking LLMs open-ended questions or challenging their assumptions to see whether they can adapt to unfamiliar situations. The higher the GPQA score, the better the model can comprehend requirements from diverse information sources. On this test, ChatGPT-4o scored 53.6%, more than the previous GPT-4 (48%), Claude3 Opus (50.4%) and Llama3 400b (48%).
MATH is the most interesting of all the tests, evaluating whether models can process mathematical information. Some commentators believe it is a better evaluation of intelligence than word-based tasks. Usually, LLMs perform poorly on mathematics questions because they don’t have structured logic engines. However, improvements are being made.
ChatGPT-4o again scored highest on this benchmark at 76.6%, followed by Claud3 Opus at 60.1%, Gemini Pro 1.5 at 58.5% and Gemini Ultra 1.0 at 53.2%. OpenAI was able to score better because of new technology that allows it to decipher complex mathematics problems and break them down into parts using logic. Other LLM developers are working on similar tech, but aren’t as advanced.
Finally, you can compare various AI models using human evaluation (sometimes called “HumanEval”). This method asks a random sample of people how highly they rate the human-likeness of bot responses. Again, ChatGPT-4o leads the field on this test at 90.2%, followed by Claud3 Opus at 84.9% and Google’s Gemini between 71% and 75%,
GPT-4o and its new features
Despite winning nearly everything, LLM benchmarks don’t convey GPT-4o’s full capabilities. While it can solve logical problems and provide human-like responses, it can also do much more.
The tool accepts as input “any combination of text, audio, image and video and generates any combination of text, audio, and image outputs.” Consequently, using it feels much more like conversing with a general intelligence – similar to what humans have. You don’t have to adapt to its quirks – it adapts to you.
This upgrade means you can interact with it in real time. Responses are human-like and rapid, giving you the impression you are having a real conversation. Multimodal reasoning and generation are also helpful. You can put images and voice into the model and get out the data you want. For example, showing ChatGPT a mountain and then asking it to plot a route up it using any existing tracks or landmarks is now something it could do.
The model is also becoming more empathetic and better at “sentiment analysis.” This capability lets it detect users’ tones and respond appropriately. This feature is a particularly valuable one for businesses. A live chat tool for website homepages could infer whether a user was excited, angry, upset, or distressed and respond appropriately.
Finally, GPT-4o can perform data analysis on charts and other visuals. While it won’t replace Microsoft’s Power BI, it can offer quick and basic insights for those in a hurry.
Apple/OpenAI deal for iOS chatbot
Long in the AI doldrums with its often confused and easily bewildered bot Siri, Apple recently leapfrogged the competition by announcing a partnership with OpenAI. The deal will provide the world’s biggest smartphone company with access to the world’s most capable AI systems for the iPhone, representing a massive upgrade for users.
That Apple was looking to include AI on its iOS 18 isn’t news. Industry onlookers have known that for a while. However, choosing OpenAI as the official partner sent the company’s stock rocketing as investors anticipate a new round of innovative products from the brand.
True to form, privacy and security will be at the deal’s heart. Apple will continue its policy of protecting user information at all costs, including shielding them from government overreach.
To facilitate this, most of the processing will occur on-device. Apple will beef up its componentry to handle the added data crunching, reducing reliance on third-party servers. Users can ask Siri whether ChatGPT has good ideas for whatever they need, whether how to cook a specific meal or what to do when travelling in a foreign city.
Whether ChatGPT will become part of live chat for website owners remains to be seen. However, it is clear the technology is ready for prime time and could offer tremendous benefits to businesses, brands, and consumers.