Web Scraping with BeautifulSoup: 3 Practical Examples

Explore three practical examples of web scraping using BeautifulSoup in Python.
By Taylor

Introduction to Web Scraping with BeautifulSoup in Python

Web scraping is a powerful technique used to extract data from websites. With Python’s BeautifulSoup library, you can easily parse HTML and XML documents to obtain the information you need. In this article, we’ll explore three practical examples of web scraping with BeautifulSoup that cater to various use cases. Whether you’re looking to gather data for research or to automate a repetitive task, these examples will guide you step-by-step.

Example 1: Scraping Article Titles from a News Website

Context

In this example, we’ll scrape the titles of articles from a news website. This is useful for gathering headlines for analysis or to stay updated on current events.

import requests
from bs4 import BeautifulSoup

## Send a request to the news website
url = 'https://news.ycombinator.com/'
response = requests.get(url)

## Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

## Find all article titles
titles = soup.find_all('a', class_='storylink')
for title in titles:
    print(title.get_text())

Notes

  • Make sure to check the website’s robots.txt file to ensure that scraping is permitted.
  • You can modify the URL to scrape titles from different sections or pages of the website.

Example 2: Extracting Product Prices from an E-commerce Site

Context

In this example, we will scrape product prices from an e-commerce site. This is particularly helpful for price comparison or market analysis.

import requests
from bs4 import BeautifulSoup

## Send a request to the e-commerce website
url = 'https://example-ecommerce.com/products'
response = requests.get(url)

## Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

## Find all product prices
prices = soup.find_all('span', class_='product-price')
for price in prices:
    print(price.get_text())

Notes

  • Replace the URL with a real e-commerce site and adjust the class name based on the site’s structure.
  • Consider implementing delays between requests to avoid overloading the server.

Example 3: Collecting Data from a Job Listings Page

Context

In this example, we’ll extract job titles and their respective company names from a job listings page. This is useful for job seekers looking to analyze job market trends.

import requests
from bs4 import BeautifulSoup

## Send a request to the job listings website
url = 'https://example-jobs.com/listings'
response = requests.get(url)

## Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

## Find all job postings
job_postings = soup.find_all('div', class_='job-listing')
for job in job_postings:
    title = job.find('h2', class_='job-title').get_text()
    company = job.find('div', class_='company-name').get_text()
    print(f'Job Title: {title}, Company: {company}')

Notes

  • Ensure to adjust the class names based on the actual HTML structure of the job listings page.
  • You might want to store the results in a CSV file for further analysis.