Ruby Web Scraping Examples - Practical Code Snippets

Explore 3 detailed Ruby web scraping examples to enhance your programming skills.
By Jamie

Introduction to Ruby Web Scraping

Web scraping is the automated process of extracting data from websites. In Ruby, this can be accomplished using libraries like Nokogiri and Open-URI, which make it easy to fetch and parse HTML content. Here, we’ll explore three diverse and practical examples of Ruby web scraping, demonstrating how to gather and process data from various sources.

Example 1: Scraping News Headlines

This example demonstrates how to scrape the latest news headlines from a news website. It can be particularly useful for aggregating content or keeping up with current events.

To use this code, ensure you have the nokogiri gem installed:

gem install nokogiri

Here’s how you can scrape news headlines:

require 'open-uri'
require 'nokogiri'

# URL of the news website to scrape
url = 'https://www.example-news-site.com'

# Open the URL and parse the HTML
html = URI.open(url)
parsed_html = Nokogiri::HTML(html)

# Extract headlines - assuming they are in <h2> tags
headlines = parsed_html.css('h2.headline')

# Output the headlines
headlines.each do |headline|
  puts headline.text.strip
end

This script fetches the page, parses it, and prints out all the headlines found within <h2> tags.

Notes:

  • Always check the website’s robots.txt file to ensure you’re allowed to scrape its content.
  • You can customize the CSS selector to match the specific HTML structure of the site you’re scraping.

Example 2: Scraping Product Prices from an E-Commerce Site

In this example, we will scrape product prices from an e-commerce website. This can be useful for price comparison or market analysis.

Make sure to have the required gems installed:

gem install nokogiri open-uri
gem install httparty

Here’s the code to scrape product prices:

require 'open-uri'
require 'nokogiri'

# URL of the e-commerce site
url = 'https://www.example-ecommerce-site.com/products'

# Open the URL and parse the HTML
html = URI.open(url)
parsed_html = Nokogiri::HTML(html)

# Extract product names and prices
products = parsed_html.css('.product')

products.each do |product|
  name = product.css('.product-name').text.strip
  price = product.css('.product-price').text.strip
  puts "#{name}: #{price}"
end

This code retrieves product names and their corresponding prices, printing them in a clear format.

Notes:

  • Modify the CSS selectors based on the actual layout of the website you are scraping.
  • Be aware of the site’s scraping policies to avoid legal issues.

Example 3: Scraping Data from a Social Media Profile

This example shows how to scrape data from a social media profile, such as recent posts or follower counts. This could be beneficial for social media analysis or marketing strategies.

Ensure you have the required gems:

gem install nokogiri open-uri
gem install httparty

Here’s how to scrape a social media profile:

require 'open-uri'
require 'nokogiri'

# URL of the social media profile
url = 'https://www.example-social-media-site.com/user/profile'

# Open the URL and parse the HTML
html = URI.open(url)
parsed_html = Nokogiri::HTML(html)

# Extract recent posts and follower count
posts = parsed_html.css('.post')
followers = parsed_html.css('.follower-count').text.strip

puts "Follower Count: #{followers}"
posts.each do |post|
  content = post.css('.post-content').text.strip
  puts "Post: #{content}"
end

This code extracts the follower count and recent posts from the specified social media profile.

Notes:

  • Social media sites often have strict scraping policies; review their terms of service.
  • You may need to handle authentication if the profile is private or requires a login.

By utilizing these examples of Ruby web scraping, you can effectively extract valuable data from various web sources, enhancing your data handling and analysis capabilities.