From 2022 until today, AI has continued to be my assistant in my routine tasks as a data analyst and scientist.
It's not that I don't trust its capabilities or fear it might replace me. It's more than that. I've given it a limited role in my work, and that's enough for me.
I don't want this to be a typical article that will be forgotten tomorrow. I want to share how I'm using some of the AIs available on the market (ChatGPT, Claude, Gemini, Perplexity, or any others you can think of). Specifically, I'll explain how I'm using them for my SQL scripts.
If you're considering learning SQL, the following graph shows that your intuition is spot on.
That's why I want to share five ways I'm using AI, from understanding to optimizing SQL code.
1) Code Explanation
In every project I was assigned to, I used to spend many hours reading and trying to understand the SQL code written by other developers or data engineers. Depending on the complexity of the code, this task could require additional meetings. Usually, explicit documentation would help, but it's not always up-to-date. That's why one of the ways I use AI is to quickly grasp other programmers' code. This saves me from having to read all the documentation and reduces the need for extra meetings. Here's the prompt I use, and you can follow it with the SQL code you want interpreted.
Here’s the prompt I use, and you can follow it with the SQL code you want to be interpreted.
Could you explain to me the following lines of code and how they work:
SELECT * FROM
(SELECT City, CustomerType,
AVG(UnitPrice * Quantity) AS prom_total,
RANK() OVER (PARTITION BY City ORDER BY AVG(UnitPrice * Quantity) DESC) AS rn_prom_amt_city
FROM sales_supermarket
GROUP BY City, CustomerType) AS subquery_top1
WHERE rn_prom_amt_ciudad <=1;
As you can see, ChatGPT was able to interpret the code line by line in this case. This helps me in my daily work, whether I need to start a project from scratch or take over another programmer's SQL code.
2) Code Generation
I find this aspect interesting and use it for specific tasks, usually short SQL scripts. I'm not obsessed with having AI do all the coding, mainly for two reasons. First, it often doesn't do it well, and to avoid the frustrations from its mistakes or hallucinations, I keep my prompts essential and as explicit as possible. Second, no matter how well it does, there's always room for optimization (we'll discuss this in the last point of this article). I still enjoy discovering new ways to write the same code. It's fun and gives me the satisfaction of saying, "ChatGPT helped me up to this point, but I want to handle the rest myself."
Here's the prompt I used:
Write a script in MySQL to properly format the “FirstName” and “LastName” columns and produce an output where both fields are concatenated with a space between them
We see the results, and they match the requirements. I insist on being explicit about what we want to generate to save ourselves many headaches.
3) Code Comments
After reading the previous points, I'm convinced this one deserves a special mention. In my experience programming in SQL, developers and data engineers often give minimal consideration to adding comments to their code. Additionally, documentation can be complex to understand if you’re taking on a project for the first time. It may seem insignificant, but I find writing code and having it properly commented and documented equally important.
With that in mind, let's look at the following prompt practically, considering the generated code from the previous example:
Could you please add comments to the following MySQL code lines?
SELECT CONCAT(UCASE(LEFT(FirstName, 1)), LCASE(SUBSTRING(FirstName, 2))) AS FormattedFirstName,
CONCAT(UCASE(LEFT(LastName, 1)), LCASE(SUBSTRING(LastName, 2))) AS FormattedLastName,
CONCAT(UCASE(LEFT(FirstName, 1)), LCASE(SUBSTRING(FirstName, 2)), ' ',
UCASE(LEFT(LastName, 1)), LCASE(SUBSTRING(LastName, 2))) AS FullName
FROM customers;
And here’s what we get:
ChatGPT's level of detail in commenting might seem excessive, but I appreciate it because it helps me understand the SQL line by line. Afterward, you can refine the comments to remove any that are unnecessary.
4) Data Dictionary Creation
Another important aspect to consider before starting any SQL development is understanding the variables you'll be working with. This is crucial before beginning the process of data manipulation, transformation, and cleaning.
Many times, I've received tables and databases without any information about the data dictionaries (which are essentially simplified descriptions of the variables). As a result, I've spent several hours creating these descriptions. It’s undoubtedly necessary.
However, for AI, this task is straightforward and it handles it quickly. Plus, it helps me a lot.
To do this, we'll use the following prompt. If you're using ChatGPT or Claude, remember that you can directly attach the file to generate the dictionary.
Can you create a data dictionary for the attached dataset? Please provide all the information in a table with all the fields
And here are the results. I like that everything can be presented in a table, but you can adjust the AI's output if needed.
5) Code Optimization
In this final point, we need to consider the goal of reducing the time it takes to process our SQL code. Continuous improvement is one of the pillars when building our next data analysis project. Often, we stick with what we initially consider the best solution. However, it's interesting to explore other alternatives that AI can offer. Let's see it as another option that brings us closer to that ideal scenario.
So, using the following prompt:
Could you optimize the following line of code and explain the changes that improve the original code I shared?
SELECT
vt1.ProductLine,
vt1.Date,
vt1.TotalSale,
CASE
WHEN LEAD(vt1.TotalSale) OVER (PARTITION BY vt1.ProductLine ORDER BY vt1.Date) > vt1.TotalSale
THEN 'Applies'
ELSE 'Does Not Apply'
END AS Promotion
FROM (
SELECT
sm_sales.ProductLine,
sm_sales.Date,
MAX(sm_sales.UnitPrice * sm_sales.Quantity) AS TotalSale
FROM supermarket_sales sm_sales
GROUP BY sm_sales.ProductLine, sm_sales.Date
) AS vt1
ORDER BY vt1.ProductLine, vt1.Date ASC;
We get the following results, where it considers the best alternative while analyzing and describing different scenarios.
I have to add that there are times when AI doesn’t necessarily provide the optimal option. That's why I suggest reviewing what it generates for us.