Web scraping is the automated process of extracting data from websites. In Ruby, this can be accomplished using libraries like Nokogiri and Open-URI, which make it easy to fetch and parse HTML content. Here, we’ll explore three diverse and practical examples of Ruby web scraping, demonstrating how to gather and process data from various sources.
This example demonstrates how to scrape the latest news headlines from a news website. It can be particularly useful for aggregating content or keeping up with current events.
To use this code, ensure you have the nokogiri
gem installed:
gem install nokogiri
Here’s how you can scrape news headlines:
require 'open-uri'
require 'nokogiri'
# URL of the news website to scrape
url = 'https://www.example-news-site.com'
# Open the URL and parse the HTML
html = URI.open(url)
parsed_html = Nokogiri::HTML(html)
# Extract headlines - assuming they are in <h2> tags
headlines = parsed_html.css('h2.headline')
# Output the headlines
headlines.each do |headline|
puts headline.text.strip
end
This script fetches the page, parses it, and prints out all the headlines found within <h2>
tags.
robots.txt
file to ensure you’re allowed to scrape its content.In this example, we will scrape product prices from an e-commerce website. This can be useful for price comparison or market analysis.
Make sure to have the required gems installed:
gem install nokogiri open-uri
gem install httparty
Here’s the code to scrape product prices:
require 'open-uri'
require 'nokogiri'
# URL of the e-commerce site
url = 'https://www.example-ecommerce-site.com/products'
# Open the URL and parse the HTML
html = URI.open(url)
parsed_html = Nokogiri::HTML(html)
# Extract product names and prices
products = parsed_html.css('.product')
products.each do |product|
name = product.css('.product-name').text.strip
price = product.css('.product-price').text.strip
puts "#{name}: #{price}"
end
This code retrieves product names and their corresponding prices, printing them in a clear format.
This example shows how to scrape data from a social media profile, such as recent posts or follower counts. This could be beneficial for social media analysis or marketing strategies.
Ensure you have the required gems:
gem install nokogiri open-uri
gem install httparty
Here’s how to scrape a social media profile:
require 'open-uri'
require 'nokogiri'
# URL of the social media profile
url = 'https://www.example-social-media-site.com/user/profile'
# Open the URL and parse the HTML
html = URI.open(url)
parsed_html = Nokogiri::HTML(html)
# Extract recent posts and follower count
posts = parsed_html.css('.post')
followers = parsed_html.css('.follower-count').text.strip
puts "Follower Count: #{followers}"
posts.each do |post|
content = post.css('.post-content').text.strip
puts "Post: #{content}"
end
This code extracts the follower count and recent posts from the specified social media profile.
By utilizing these examples of Ruby web scraping, you can effectively extract valuable data from various web sources, enhancing your data handling and analysis capabilities.