AI & Python #5: Web Scraping in Python with Beautiful Soup

Getting started with web scraping in Python (Part 1).

Feb 22, 2024

∙ Paid

Hi!
I prepared two tutorials to help you get started with Beautiful Soup and Selenium. In these tutorials, we’ll scrape a simple website from scratch so that you see with your own eyes what are the differences between these two libraries.

In this article, we’ll extract football data from all the FIFA World Cups played between 1930 to 2022. That’s around one thousand games.

Here are six of these games (we’ll extract some of this data).

To extract this data, we’ll scrape Wikipedia using Python and Beautiful Soup. The data we want to extract is split into multiple Wikipedia pages, so we’ll start by extracting data from one page and then we’ll create a for loop to extract data from all the pages.

Let’s install the libraries.

Installing the libraries

In this tutorial, we’ll use bs4 to scrape websites, lxml to parse HTML documents, and requests to send requests to the target website.

Here’s the command you need to run in the terminal to install these libraries.

pip install bs4
pip install lxml
pip install requests

In addition to the previous libraries, we’ll install pandas to better manage the data we’re going to extract.

pip install pandas

Now let’s start coding!

Artificial Corner

AI & Python #5: Web Scraping in Python with Beautiful Soup

Getting started with web scraping in Python (Part 1).

Installing the libraries

Part 1: Scraping data from one World Cup

This post is for paid subscribers