Here's Why I Still Don't Buy the Hype of Google Gemini
Gemini might not be as good as it seems to be.
Google’s Gemini was just unveiled, and I haven’t seen so much hype since ChatGPT was released by OpenAI.
Gemini is Google’s most powerful AI model and what makes it different from others is its multimodality. Traditionally, achieving multimodality involved using different models trained for specific tasks separately (text, image, etc). However, Gemini was built from the ground up for multimodality, which allows it to reason seamlessly across text, images, video, audio, and code.
The result? An AI that beats GPT-4 … on paper (and demos).
At least that’s what some of us feel after discovering what I’m about to show you. Here’s why I still don’t buy the hype of Gemini.
Gemini AI beats GPT-4 … but the gap isn’t that big
Probably you’ve seen the image below where Google shows that Gemini Ultra is more powerful than GPT-4.
And you might’ve also seen this detailed comparison of Gemini Ultra and GPT-4.
In the detailed comparison, we can see that Gemini Ultra outperforms GPT-4, but the gap is reduced if you check the 60-page paper released by Google.
Check out the MMLU comparison. The 86.4% of GPT-4 increases to 87.29% if we consider the same prompting technique for evaluation CoT@32.
The only version of Gemini available for users right now is Gemini Pro, which was integrated into Bard and is no match for GPT-4.
Gemini was introduced in three different versions.
Gemini Ultra: The largest and most powerful model designed to handle highly complex tasks (the one that beats GPT-4)
Gemini Pro: Suitable for solving a wide range of tasks. It has fewer parameters in its construction but will directly compete with GPT-3.5
Gemini Nano: Tailored for on-device tasks.
The thing is, the model everyone is talking about, Gemini Ultra, isn’t available yet. It should be available to users through “Bard Advanced” only early next year.
In the meantime, what know about Gemini Ultra is the numbers Google showed us and a Hands-on with Gemini made by Google, which isn’t much of a “hands-on”.
The Hands-on with Gemini demo isn’t that real
I have to admit the Gemini demo blew up my mind.
In the demo, Google shows off Gemini’s multimodal capabilities. We see how we can easily talk with the AI, how it can recognize your images quickly, track objects in real time, and more.
Very impressive … until you open the video description and read this.
For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.
So neither the video happened in real-time nor the spoken prompts were used.
In fact, according to a Bloomberg report, Google admitted when asked for comment that the video demo didn’t happen in real-time with spoken prompts but instead used still image frames from raw footage and then wrote out text prompts to which Gemini responded.
When searching for more information about the demo released by Google, I came across this How it’s Made article in the Google blog. I was surprised when I discovered that what seemed to be one of Gemini’s differentiators compared to GPT-4 (the ability to understand and generate responses considering the video modality) was, in reality, a sequence of pre-established image frames.
Here’s how the rock, paper, scissor clip was made.
In the demo, the ability of Gemini to interpret the game of rock, paper, scissors in real time was impressive. However, in reality, it might not be that impressive.
We all know that for obvious reasons, they had to omit all the details above, but the hands-on looks more like an ad.
Besides the video editing, there’s also the actual prompt that was used.
The prompts used to get the results in the video and those we hear in the video are different.
Here’s an example. In minute 4:36 of the demo, we hear “based on the design, which of these would go faster?” referring to the two images on the table. Gemini responds “The car on the right will go faster. It’s more aerodynamic”
However, this was the actual prompt used.
As you can see, there’s a difference between the prompt spoken in the video and the prompt written to get the results we’ve seen.
For some, the demo raises doubts about Gemini’s capabilities. I can’t tell whether Gemini Ultra is as good as some say until I do my own hands-on or see someone do one without all this fancy editing.
What about the training data?
Many have pointed out on Twitter (X) that Google hasn’t provided any information on how the training data was made or filtered, which is ironic because even they say the training data is key.
This tweet was responded to by Jeff Dean, Chief Scientist of Google DeepMind and Google Research.
Hopefully, people will have access to the Gemini Ultra model soon and we’ll see whether it can live up to the hype.
While you may approach this latest news with excitement, it’s important to be cautious.
Gemini might not be as good as it seems to be in the demo
Gemini Ultra isn’t available yet. Gemini Pro is available in Bard, but it only competes with GPT-3.5
Details about the training data used for the tests were not provided yet
We should be a bit cautious, considering the less-than-ideal experience when Bard was launched on a grand scale in early 2023. Despite the initial hype, it turned out to be a disappointment for users due to various errors that emerged when it was tried by users.
That said, if Gemini Ultra is as good as it seems, I’ll be praising it after I do my own hands-on with Gemini.
In the meantime, I just can’t buy the hype.
The essayusa writing service understands how important it is to concentrate on specific aspects of your major. It is aware that academic requirements might occasionally wander off into irrelevant territory, diverting your attention away from subjects that are essential for your future job. https://essayusa.com/ acts as a guiding force, assisting you in navigating through the overwhelming wealth of information and ensuring that you are able to concentrate on the aspects of your professional development that are genuinely important.