Gemini Ultra vs GPT-4: Did Google Beat GPT-4 This Time?
The good, bad and unexpected of Gemini Ultra.
Hi!
We’ll make a brief pause on our mini series about the role of data in the AI revolution because Gemini Advanced was just released. Next week, we’ll continue with it so stay tuned!
If you like articles like this, consider becoming a paid subscriber so you don’t miss the exclusive paid articles I publish here on Substack.
Google just released Gemini Advanced, which is powered by its most capable AI model Gemini Ultra.
Yes, that’s the same model that a few months ago beat GPT-4 in the benchmarks. Now we have the opportunity to see ourselves if it’s actually better than GPT-4.
In this article, we’ll see the good and bad of Gemini Ultra by comparing it with GPT-4.
The Good of Gemini Ultra
Speed
When it comes to speed in most tests Gemini is faster than GPT-4. The problem is that Gemini doesn’t always give good responses.
We’ll see the quality of the responses of Gemini in the rest of the article, but, yes, Gemini is faster than GPT-4.
Google’s native apps
I used both Gemini Ultra and GPT-4 as personal assistants and asked them to help me plan my next trip.
I’m traveling next week from California to New York. Show me flights to New York and hotels near Central Park
Gemini automatically connects to Google apps such as Flights and Hotels to provide real-time information.
GPT-4 lacks this feature. Plugins or a GPT with custom actions might get you something similar to Google’s native apps. By default, GPT-4 will respond something like this.
The Unexpected of Gemini Ultra: Realistic images
Gemini’s images seem like stock images taken by real photographers rather than images made with AI. In this regard, we can say that Gemini’s images are more realistic (or less sophisticated) than DALL-E 3 or Midjourney.
make an image of a programmer learning to code.
However, unlike DALL-E 3, Gemini doesn’t improve your prompt. If I use the same prompt on ChatGPT, DALL-E 3 will generate a prompt that gives a more eye-catching look to the image.
If I had to choose an image for my article, I’d choose the image on the left generated by DALL-E 3. That’s a personal preference though.
What it’s undeniable is that I wouldn’t be able to come up with the prompt below that DALL-E 3 generated for me.
Imagine a scene where a young programmer is deeply engrossed in learning to code. The setting is a cozy, well-lit room at night, filled with technology paraphernalia: a large monitor displaying lines of code, books on programming scattered around, and a cup of coffee steaming on the desk. The programmer, wearing casual attire, is focused intently on the screen, typing away on a keyboard, with a look of determination and curiosity. Notes and diagrams are pinned to a corkboard behind them, illustrating various programming concepts. The atmosphere is one of quiet dedication and the pursuit of knowledge.
That’s why I think that DALL-E 3 is more useful if you’re new to image generation or when you lack creativity.
If you have some experience with image generation, you could control the aesthetic of Gemini’s images as I do in the prompt below. In that way, Gemini’s images will look less realistic, which seems to be the default.
podium with bots in first second and third place, sunday comics aesthetica
I wasn’t happy with the images Gemini generated, so I asked if the bot in position #1 could hold a trophy and I got what I asked for.
Now, if you’re an expert in creating detailed prompts for images, I think DALL-E 3 or Midjourney might be better than Gemini.
Let’s test the prompt below with Gemini.
A widescreen landscape style image with an action movie point of view featuring, in the center, a man with a bewildered expression, pushing a shopping cart. The aisle is lined with price tags, but instead of ordinary products, the shelves display a variety of intricate mechanical parts suggesting a surreal or futuristic shopping experience. The perspective should be from the end of the aisle looking towards the entrance. Sunday comics aesthetica
Here’s what DALL-E 3 generated.
Overall, I’d use Gemini to generate realistic images with a simple prompt. Other than that, I’d use either DALL-E 3 or Midjourney.
Note: I wanted to test image generation in Europe, but every time I tried I got the message “I can't create images yet so I'm not able to help you with that.” I could fix that by switching to the USA with a VPN.
The Bad of Gemini Ultra
Reasoning
I gave both models two reasoning exercises to see if they could get the right answer.
First, I started with this simple exercise.
Find the next number in the sequence: 30, 45, 90, 225, 675, …
Gemini got the answer quickly, but it was wrong.
That said, when I clicked on “show drafts” I found that draft #3 had the right answer, so we can say that it got it right in the third attempt.
It took GPT-4 more time, but it got the right answer on the first try.
The second exercise was more challenging.
Solve this exercise:
LOO, MON, NOM, OOL, ____
A. POK
B. HOL
C. HOK
D. JOI
The three initial drafts of Gemini were wrong, so I asked again and got the correct answer in draft 3.
On the other hand, GPT-4 failed the first time but found the correct answer on the second attempt.
Overall, GPT-4 is better at reasoning than Gemini.
Code generation
Both models are good at explaining code, but when it comes to generating code it seems Gemini has some limitations set by Google.
If we want to create a snake game, Gemini generates the code and you can even export it to Google Colab with one click.
But when you try to generate other types of scripts, it starts generating the code and suddenly stops and throws the message below.
Searching on the internet, I found that some YouTubers were experiencing the same issues with Gemini not only when generating code but also when explaining code.
That never happens to me with GPT-4, so I’d say GPT-4 beats Gemini Ultra in coding.
That’s it! I didn’t compare features like reading PDFs or analyzing datasets because right now Gemini doesn’t support uploading files other than images. Also, this time I didn’t do a creativity test because that’s a bit subjective, so I leave that to you.
Let me know in the comments which one you think is the best model.
Insightful comparison. One thing I find interesting about Gemini in the first image generation example is that they seemed to go the extra mile and make the model as unbiased as possible. I haven't seen any other diffusion model that doesn't fall in the cliche of white male programmers. I can't imagine they're doing it without injecting something into the prompt.