Web scraping is a powerful technique used to extract data from websites. With Python’s BeautifulSoup library, you can easily parse HTML and XML documents to obtain the information you need. In this article, we’ll explore three practical examples of web scraping with BeautifulSoup that cater to various use cases. Whether you’re looking to gather data for research or to automate a repetitive task, these examples will guide you step-by-step.
In this example, we’ll scrape the titles of articles from a news website. This is useful for gathering headlines for analysis or to stay updated on current events.
import requests
from bs4 import BeautifulSoup
## Send a request to the news website
url = 'https://news.ycombinator.com/'
response = requests.get(url)
## Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')
## Find all article titles
titles = soup.find_all('a', class_='storylink')
for title in titles:
print(title.get_text())
robots.txt
file to ensure that scraping is permitted.In this example, we will scrape product prices from an e-commerce site. This is particularly helpful for price comparison or market analysis.
import requests
from bs4 import BeautifulSoup
## Send a request to the e-commerce website
url = 'https://example-ecommerce.com/products'
response = requests.get(url)
## Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')
## Find all product prices
prices = soup.find_all('span', class_='product-price')
for price in prices:
print(price.get_text())
In this example, we’ll extract job titles and their respective company names from a job listings page. This is useful for job seekers looking to analyze job market trends.
import requests
from bs4 import BeautifulSoup
## Send a request to the job listings website
url = 'https://example-jobs.com/listings'
response = requests.get(url)
## Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')
## Find all job postings
job_postings = soup.find_all('div', class_='job-listing')
for job in job_postings:
title = job.find('h2', class_='job-title').get_text()
company = job.find('div', class_='company-name').get_text()
print(f'Job Title: {title}, Company: {company}')