Modern examples of Ruby web scraping examples - practical code snippets
Fast-start examples of Ruby web scraping examples – practical code snippets
Let’s start with the smallest possible example of scraping HTML with Ruby. This is the baseline pattern that all the other examples build on.
## Gemfile
gem 'httparty'
gem 'nokogiri'
## scrape_headlines.rb
require 'httparty'
require 'nokogiri'
url = 'https://news.ycombinator.com/'
html = HTTParty.get(url).body
doc = Nokogiri::HTML(html)
headlines = doc.css('.titleline a').map(&:text)
headlines.first(10).each_with_index do |title, idx|
puts "#{idx + 1}. #{title}"
end
This tiny script shows the core pattern used in most examples of Ruby web scraping examples – practical code snippets:
- Fetch HTML with
HTTParty.get - Parse with
Nokogiri::HTML - Use CSS selectors to extract the data you care about
From here, we can move into more realistic scenarios.
Real examples include scraping product prices for monitoring
One of the best examples of Ruby web scraping in real life is price monitoring. Say you want to track the price of a specific laptop on an e‑commerce site for internal analytics.
require 'httparty'
require 'nokogiri'
PRODUCT_URL = 'https://example.com/laptops/awesome-model'
response = HTTParty.get(PRODUCT_URL, headers: {
'User-Agent' => 'PriceMonitorBot/1.0 (contact: dev@example.com)'
})
doc = Nokogiri::HTML(response.body)
name = doc.at_css('h1.product-title')&.text&.strip
price = doc.at_css('span.price')&.text&.strip
puts({
name: name,
price: price,
scraped_at: Time.now.utc
}).to_json
Patterns worth copying from this example of price scraping:
- Identify stable selectors (
h1.product-title,span.price) using your browser’s inspector. - Always send a clear
User-Agentstring instead of pretending to be a browser. - Normalize output into JSON so it’s easy to pipe into a database or queue.
As of 2024–2025, many retailers also expose public APIs or structured data in JSON‑LD within the page. Before scraping, check for <script type="application/ld+json"> blocks — you may get cleaner data than parsing HTML.
ld_json = doc.css('script[type="application/ld+json"]').map(&:text)
## Often contains product name, price, currency, and availability
Another example of Ruby web scraping: aggregating news headlines
News aggregation is a classic use case, and a perfect fit for examples of Ruby web scraping examples – practical code snippets. This time, let’s scrape multiple sources and normalize the output.
require 'httparty'
require 'nokogiri'
require 'json'
SOURCES = {
hn: 'https://news.ycombinator.com/',
bbc: 'https://www.bbc.com/news',
npr: 'https://www.npr.org/sections/news/'
}
HEADERS = {
'User-Agent' => 'RubyNewsAggregator/1.0 (contact: dev@example.com)'
}
articles = []
SOURCES.each do |key, url|
html = HTTParty.get(url, headers: HEADERS).body
doc = Nokogiri::HTML(html)
case key
when :hn
doc.css('.titleline a').first(10).each do |a|
articles << { source: 'Hacker News', title: a.text.strip, url: a['href'] }
end
when :bbc
doc.css('a.gs-c-promo-heading').first(10).each do |a|
articles << { source: 'BBC', title: a.text.strip, url: "https://www.bbc.com#{a['href']}" }
end
when :npr
doc.css('article h2 a').first(10).each do |a|
articles << { source: 'NPR', title: a.text.strip, url: a['href'] }
end
end
end
puts JSON.pretty_generate(articles)
This example of multi‑site scraping highlights a recurring reality in 2024–2025: every site has slightly different markup. The trick is to normalize early — convert everything into a shared structure (source, title, url) so the rest of your system doesn’t care where it came from.
Examples of Ruby web scraping examples – practical code snippets for job listings
Job boards are another area where examples of Ruby web scraping shine, especially when you want a custom feed filtered by tech stack, salary range, or location.
require 'httparty'
require 'nokogiri'
require 'csv'
BASE_URL = 'https://remoteok.com/remote-ruby-jobs'
response = HTTParty.get(BASE_URL, headers: {
'User-Agent' => 'RubyJobScraper/1.0 (contact: dev@example.com)'
})
doc = Nokogiri::HTML(response.body)
rows = doc.css('tr.job')
CSV.open('ruby_jobs.csv', 'w') do |csv|
csv << %w[title company location tags url]
rows.each do |row|
title = row.at_css('h2')&.text&.strip
company = row.at_css('.companyLink')&.text&.strip
location = row.at_css('.location')&.text&.strip
tags = row.css('.tag').map { |t| t.text.strip }.join(', ')
url = "https://remoteok.com#{row['data-href']}"
csv << [title, company, location, tags, url]
end
end
puts 'Saved ruby_jobs.csv'
In 2024, more job platforms expose official APIs, but many smaller or niche boards still don’t. This kind of example of Ruby web scraping lets you build your own internal job radar while respecting rate limits and terms of service.
Before scraping any job site, read their robots.txt and terms. The FTC’s guidance on automated data collection is worth a read if you’re in the U.S., and many university legal clinics (for example, Harvard Cyberlaw Clinic) publish accessible commentary on scraping and data use.
Handling JavaScript-heavy pages: a more advanced example of Ruby web scraping
Static HTML is easy. The pain starts when everything is rendered by JavaScript. In 2024–2025, this is the rule, not the exception. For these cases, you need a headless browser.
Here’s a practical example using the ferrum gem (a pure-Ruby driver for Chrome):
## Gemfile
gem 'ferrum'
## scrape_dynamic.rb
require 'ferrum'
browser = Ferrum::Browser.new(timeout: 30)
begin
browser.goto('https://example.com/prices')
# # Wait for the dynamic table to load
browser.at_css('table.prices') # raises if not found within timeout
rows = browser.css('table.prices tbody tr')
data = rows.map do |row|
cells = row.css('td')
{
name: cells[0].text.strip,
sku: cells[1].text.strip,
price: cells[2].text.strip
}
end
puts data
ensure
browser.quit
end
This example of Ruby web scraping is heavier than the Nokogiri-only approach, but for single-page apps or dashboards behind login, a headless browser is often the only realistic option. Be prepared for higher CPU and memory usage, and consider running this kind of scraper on a schedule rather than continuously.
If you need full Selenium support (for example, to reuse existing QA infrastructure), the selenium-webdriver gem works similarly, but ferrum keeps the stack all‑Ruby.
Examples include scraping paginated data sets with polite rate limiting
Many real examples of Ruby web scraping examples – practical code snippets need to handle pagination and rate limiting. Let’s say you want to scrape multiple pages of a public directory.
require 'httparty'
require 'nokogiri'
BASE_URL = 'https://example.com/directory?page='
HEADERS = { 'User-Agent' => 'DirectoryScraper/1.0 (contact: dev@example.com)' }
page = 1
all_items = []
loop do
url = "#{BASE_URL}#{page}"
resp = HTTParty.get(url, headers: HEADERS)
break if resp.code == 404
doc = Nokogiri::HTML(resp.body)
items = doc.css('.directory-item')
break if items.empty?
items.each do |item|
all_items << {
name: item.at_css('.name')&.text&.strip,
email: item.at_css('.email')&.text&.strip
}
end
puts "Scraped page #{page}, #{items.size} items"
page += 1
sleep(rand(1.0..2.5)) # polite, jittered delay
end
puts "Total items: #{all_items.size}"
This example of Ruby web scraping shows a pattern you should reuse:
- Stop when you hit a 404 or when the page returns no items.
- Add jittered sleep to look less like a bot hammering the server.
- Log progress so you can debug partial runs.
If you’re scraping public health or research sites, pay extra attention to politeness. For example, the CDC and NIH provide public data and APIs; scraping HTML should be a last resort after checking for official data feeds.
API-style scraping: extracting JSON from HTML requests
Sometimes the best examples of Ruby web scraping never touch the visible HTML at all. Many modern sites call JSON endpoints from the browser; your Ruby code can call those directly.
Use your browser’s Network tab to watch for fetch or XHR calls while you interact with the page. Once you find the URL, you can hit it from Ruby.
require 'httparty'
require 'json'
API_URL = 'https://example.com/api/v1/products?category=laptops'
response = HTTParty.get(API_URL, headers: {
'User-Agent' => 'RubyApiScraper/1.0 (contact: dev@example.com)',
'Accept' => 'application/json'
})
data = JSON.parse(response.body)
products = data.fetch('products', []).map do |p|
{
id: p['id'],
name: p['name'],
price: p['price'],
currency: p['currency']
}
end
puts products
This is arguably the cleanest example of Ruby web scraping you’ll see: you’re still automating data collection from the web, but you’re working with structured JSON rather than scraping HTML that might change every redesign.
2024–2025 trends that affect examples of Ruby web scraping
If you’re writing new scrapers today, a few 2024–2025 trends matter more than they did a few years ago:
- Heavier JavaScript: Single-page apps and infinite scroll are everywhere. Expect to use tools like Ferrum or Selenium more often.
- Stricter anti-bot measures: CAPTCHAs, advanced bot detection, and IP reputation systems are more common. For many sites, the honest path is to request API access instead of fighting the protections.
- More official APIs and data portals: Governments and research institutions increasingly publish open data. For example, the U.S. government’s data.gov portal offers machine-readable datasets that are far better than scraping HTML tables.
- Legal scrutiny: High‑profile scraping cases in U.S. courts have made developers more cautious. Read your target site’s terms, understand fair use in your jurisdiction, and talk to counsel if you’re scraping at scale or redistributing data.
These trends don’t kill scraping, but they do change how you design it. The best examples of Ruby web scraping in 2025 are smaller, targeted, and often support internal analytics rather than public redistribution.
FAQ: common questions about examples of Ruby web scraping
Q: What are some real examples of Ruby web scraping used in production?
Real-world uses include monitoring competitor prices, aggregating news for internal dashboards, building custom job feeds, watching airline or hotel prices for internal revenue teams, and collecting public research data when no API exists.
Q: Can you give an example of handling CAPTCHAs in Ruby scrapers?
Most serious CAPTCHAs are designed to block automation. In practice, teams either avoid those sites, negotiate API access, or integrate third‑party CAPTCHA-solving services (which raises its own ethical and legal questions). As a rule of thumb, if a site invests in heavy CAPTCHA protection, treat that as a strong signal they don’t want automated scraping.
Q: How do I keep these examples of Ruby web scraping maintainable over time?
Keep selectors in one place, write small helper methods for fetching and parsing, and log failures when a selector returns nil. When a site changes its layout, you want one or two lines to update, not a scavenger hunt across multiple files.
Q: Is it legal to use these examples of Ruby web scraping against any public website?
“Publicly visible” and “free to scrape” are not the same thing. Laws vary by country, and court decisions evolve. Read the site’s terms, check robots.txt, and talk to a lawyer if scraping is core to your business. For U.S. context, the FTC’s business guidance pages are a good starting point, and many law schools publish readable analyses of scraping cases.
Q: When should I prefer an API over HTML scraping?
Almost always. If a site offers an official API or a documented data portal, use that. You’ll get cleaner data, clearer rate limits, and fewer legal headaches. Use HTML scraping as a last resort when no structured option exists.
The examples of Ruby web scraping examples – practical code snippets above won’t cover every edge case you’ll hit in the wild, but they do map out the core patterns: static HTML scraping, multi‑site aggregation, pagination, dynamic pages, and JSON endpoints. Once you’re comfortable with these, adapting them to your own use cases becomes a matter of tweaking selectors, headers, and output formats rather than reinventing the entire stack each time.
Related Topics
Practical examples of examples of Ruby regular expressions examples
The best examples of Ruby looping constructs: 3 practical examples for real projects
Practical examples of Ruby command-line interface examples for 2025
The best examples of examples of basic Ruby syntax examples for beginners
Modern examples of Ruby web scraping examples - practical code snippets
Practical examples of Ruby exception handling examples for real apps
Explore More Ruby Code Snippets
Discover more examples and insights in this category.
View All Ruby Code Snippets