AI & Python #6: Web Scraping in Python with Selenium
Getting started with web scraping in Python (Part 2).
Here’s the second part of web scraping with Python. In this tutorial, we’ll learn how to web scrape with Selenium.
In part 1, I mentioned that there was missing data and that we would scrape it with Selenium in part 2, but I checked again and there isn’t missing data thanks to some changes to the Wikipedia page, so, in this tutorial, we’ll scrape a different page with Selenium 4.
Below you can find the complete video tutorial. I've included the code to copy and paste at the end as a bonus for paying subscribers.
After pasting the code below to your code editor, make sure you edit the path
variable. Also, keep in mind that in some countries the target website (Audible) might redirect you to another page that isn’t the one shown in the video tutorial. If that’s the case, just use a VPN and switch to the USA before running the script (I use a free VPN called Tunnelbear for this).
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import pandas as pd
web = 'https://www.audible.com/search'
path = 'paste-chromedriver-path-here'
service = Service(executable_path=path)
driver = webdriver.Chrome(service=service)
driver.get(web)
products = driver.find_elements(by='xpath', value='//li[contains(@class, "productListItem")]')
book_title = []
book_author = []
book_length = []
for product in products:
book_title.append(product.find_element(by='xpath', value='.//h3[contains(@class, "bc-heading")]').text)
book_author.append(product.find_element(by='xpath', value='.//li[contains(@class, "authorLabel")]').text)
book_length.append(product.find_element(by='xpath', value='.//li[contains(@class, "runtimeLabel")]').text)
driver.quit()
df_books = pd.DataFrame({'title': book_title, 'author': book_author, 'length': book_length})
df_books.to_csv('books.csv', index=False)
If you have any questions, feel free to ask in the comment section.