Last week, Google integrated Gemini Pro into Bard and, in this article, I’ll compare it with its direct competitor GPT-3.5. We’ll see how good both are in writing, reasoning, and code generation.
This article will be useful for those who use GPT-3.5 often and would like to see whether using Gemini Pro is worth it.
Note: Google’s Gemini Ultra (the GPT-4 competitor) isn’t available yet. For more info about Gemini Ultra and what I think about it, check this article.
General requests: Writing, roles, and personal assistance
First, I used Gemini Pro and GPT-3.5 as personal assistants and asked them both to help me plan my next trip.
I’m traveling next week from California to New York. Show me flights to New York and hotels near Central Park
Thanks to the latest Bard update, it connects to Google apps such as Flights and Hotels to provide real-time information.
Bard can also connect to your personal apps such as Gmail and Drive to give a customized experience in its responses. However, unlike Bard, GPT-3.5 has no internet access or connection to third-party apps.
For my second test, I gave a request where both had to get a role. In the past, Bard failed to understand this type of request but now it follows the instructions successfully. GPT-3.5 also did a good job.
Finally, I asked both models to generate a witty blog post using humor. Both created nice pieces.
In this first round, both models had similar performance, but I liked the post created by Gemini Pro a bit more and also liked that Bard can connect to Google apps. That opens new possibilities that are exclusive to Bard.
If I had to choose one of them after this first comparison, I’d choose Gemini Pro.
Reasoning
I put Gemini Pro and GPT-3.5 to the test with math questions. According to the paper Google released last week, Gemini Pro is superior to GPT-3.5 in grade-school math but is close to GPT-3.5 in math problems across 5-difficulty levels & 7 subdisciplines.
For my test, I gave both models 3 exercises.
Both models successfully solved the first and easiest exercise.
Pens cost more than pencils.
Pens cost less than eraser.
Erasers cost more than pencils and pens.If the first two statements are true, the third statement is ___
The second exercise consisted in completing a sequence of numbers. Gemini Pro failed while GPT-3.5 got it right.
Find the next number in the sequence: 30, 45, 90, 225, 675, …
For the final and most difficult exercise, both models had to find the next element in the sequence below. Both failed.
LOO, MON, NOM, OOL, ____
A. POK
B. HOL
C. HOK
D. JOI
For some reason, in the last exercise, both gave the same answer, which is incorrect according to the site where I got the exercise from (when I gave the same exercise to GPT-4 it was able to find the correct answer).
When it comes to reasoning, I think GPT-3.5 is slightly better than Gemini Pro, but more tests would be necessary to choose a winner.
Code
I’ve tested both Gemini Pro and GPT-3.5 for solving coding questions and generating code and I was quite surprised with the results.
First, I gave both models the Python code below and asked for an explanation.
from bs4 import BeautifulSoup
import requests
website = 'www.example.com'
result = requests.get(website)
content = result.text
soup = BeautifulSoup(content, 'lxml')
print(soup.prettify())
box = soup.find('article', class_='main-article')
title = box.find('h1').get_text()
transcript = box.find('div', class_='full-script').get_text(strip=True, separator=' ')
with open(f'{title}.txt', 'w') as file:
file.write(transcript)
Both gave a nice explanation of the code.
However, when I asked to generate code, Bard sometimes refused to do it.
write Python code that sends an email from “email_1” to “email_2” with the subject “Email sent by Bard” and the content “Bard rocks!”
This forced me to tweak the prompts to get Bard to generate the script.
As you can see, in the end, it worked, but the second prompt lacked customization in the subject and message of the email.
GPT-3.5 had no issue with this.
GPT-3.5 is clearly the winner here.
The Final Verdict
Both models are good for general requests, but you can connect to Google apps with Bard, which opens new possibilities with Gemini Pro.
GPT-3.5 is slightly better than Gemini Pro in reasoning. That said both failed to solve complex problems.
Both models can assist you when you have coding questions, but Bard would sometimes refuse to generate code.
I’d say both models are on the same level. The major difference lies in the benefits and limitations of the chatbots they powered: Bard and ChatGPT. As a programmer, I’d use GPT-3.5 because ChatGPT almost never refuses to generate code. That said, in some cases, I’d use Bard to get access to Google apps.
As a side note, I have to say that I don’t see myself using Gemini Pro or GPT-3.5 because I have access to GPT-4, which is far superior to both of them. Its major drawback is its price, but other than that GPT-4 is the best model available out there. When Gemini Ultra is released, I’ll make a similar comparison to see which one is better.