Modern examples of Ruby web scraping examples - practical code snippets

If you’re looking for modern, working examples of Ruby web scraping examples – practical code snippets you can drop straight into a project, you’re in the right place. Instead of vague theory, this guide walks through real examples that scrape product prices, news headlines, job listings, APIs hidden behind HTML, and more. Each example of Ruby web scraping is focused on a realistic use case you might actually ship to production. You’ll see how to combine Ruby with gems like Nokogiri, HTTParty, Ferrum, and Selenium to handle dynamic sites, pagination, and polite scraping practices. These examples of Ruby web scraping examples – practical code snippets are written with 2024–2025 realities in mind: more JavaScript-heavy pages, stricter bot detection, and growing expectations around legal and ethical scraping. By the end, you’ll have a toolkit of real examples you can adapt for monitoring prices, aggregating content, or feeding your own internal dashboards, without wasting time on toy demos.
Written by
Jamie
Published

Fast-start examples of Ruby web scraping examples – practical code snippets

Let’s start with the smallest possible example of scraping HTML with Ruby. This is the baseline pattern that all the other examples build on.

## Gemfile
gem 'httparty'
gem 'nokogiri'

## scrape_headlines.rb
require 'httparty'
require 'nokogiri'

url  = 'https://news.ycombinator.com/'
html = HTTParty.get(url).body
doc  = Nokogiri::HTML(html)

headlines = doc.css('.titleline a').map(&:text)

headlines.first(10).each_with_index do |title, idx|
  puts "#{idx + 1}. #{title}"
end

This tiny script shows the core pattern used in most examples of Ruby web scraping examples – practical code snippets:

  • Fetch HTML with HTTParty.get
  • Parse with Nokogiri::HTML
  • Use CSS selectors to extract the data you care about

From here, we can move into more realistic scenarios.


Real examples include scraping product prices for monitoring

One of the best examples of Ruby web scraping in real life is price monitoring. Say you want to track the price of a specific laptop on an e‑commerce site for internal analytics.

require 'httparty'
require 'nokogiri'

PRODUCT_URL = 'https://example.com/laptops/awesome-model'

response = HTTParty.get(PRODUCT_URL, headers: {
  'User-Agent' => 'PriceMonitorBot/1.0 (contact: dev@example.com)'
})

doc = Nokogiri::HTML(response.body)

name  = doc.at_css('h1.product-title')&.text&.strip
price = doc.at_css('span.price')&.text&.strip

puts({
  name: name,
  price: price,
  scraped_at: Time.now.utc
}).to_json

Patterns worth copying from this example of price scraping:

  • Identify stable selectors (h1.product-title, span.price) using your browser’s inspector.
  • Always send a clear User-Agent string instead of pretending to be a browser.
  • Normalize output into JSON so it’s easy to pipe into a database or queue.

As of 2024–2025, many retailers also expose public APIs or structured data in JSON‑LD within the page. Before scraping, check for <script type="application/ld+json"> blocks — you may get cleaner data than parsing HTML.

ld_json = doc.css('script[type="application/ld+json"]').map(&:text)
## Often contains product name, price, currency, and availability

Another example of Ruby web scraping: aggregating news headlines

News aggregation is a classic use case, and a perfect fit for examples of Ruby web scraping examples – practical code snippets. This time, let’s scrape multiple sources and normalize the output.

require 'httparty'
require 'nokogiri'
require 'json'

SOURCES = {
  hn:   'https://news.ycombinator.com/',
  bbc:  'https://www.bbc.com/news',
  npr:  'https://www.npr.org/sections/news/'
}

HEADERS = {
  'User-Agent' => 'RubyNewsAggregator/1.0 (contact: dev@example.com)'
}

articles = []

SOURCES.each do |key, url|
  html = HTTParty.get(url, headers: HEADERS).body
  doc  = Nokogiri::HTML(html)

  case key
  when :hn
    doc.css('.titleline a').first(10).each do |a|
      articles << { source: 'Hacker News', title: a.text.strip, url: a['href'] }
    end
  when :bbc
    doc.css('a.gs-c-promo-heading').first(10).each do |a|
      articles << { source: 'BBC', title: a.text.strip, url: "https://www.bbc.com#{a['href']}" }
    end
  when :npr
    doc.css('article h2 a').first(10).each do |a|
      articles << { source: 'NPR', title: a.text.strip, url: a['href'] }
    end
  end
end

puts JSON.pretty_generate(articles)

This example of multi‑site scraping highlights a recurring reality in 2024–2025: every site has slightly different markup. The trick is to normalize early — convert everything into a shared structure (source, title, url) so the rest of your system doesn’t care where it came from.


Examples of Ruby web scraping examples – practical code snippets for job listings

Job boards are another area where examples of Ruby web scraping shine, especially when you want a custom feed filtered by tech stack, salary range, or location.

require 'httparty'
require 'nokogiri'
require 'csv'

BASE_URL = 'https://remoteok.com/remote-ruby-jobs'

response = HTTParty.get(BASE_URL, headers: {
  'User-Agent' => 'RubyJobScraper/1.0 (contact: dev@example.com)'
})

doc = Nokogiri::HTML(response.body)

rows = doc.css('tr.job')

CSV.open('ruby_jobs.csv', 'w') do |csv|
  csv << %w[title company location tags url]

  rows.each do |row|
    title    = row.at_css('h2')&.text&.strip
    company  = row.at_css('.companyLink')&.text&.strip
    location = row.at_css('.location')&.text&.strip
    tags     = row.css('.tag').map { |t| t.text.strip }.join(', ')
    url      = "https://remoteok.com#{row['data-href']}"

    csv << [title, company, location, tags, url]
  end
end

puts 'Saved ruby_jobs.csv'

In 2024, more job platforms expose official APIs, but many smaller or niche boards still don’t. This kind of example of Ruby web scraping lets you build your own internal job radar while respecting rate limits and terms of service.

Before scraping any job site, read their robots.txt and terms. The FTC’s guidance on automated data collection is worth a read if you’re in the U.S., and many university legal clinics (for example, Harvard Cyberlaw Clinic) publish accessible commentary on scraping and data use.


Handling JavaScript-heavy pages: a more advanced example of Ruby web scraping

Static HTML is easy. The pain starts when everything is rendered by JavaScript. In 2024–2025, this is the rule, not the exception. For these cases, you need a headless browser.

Here’s a practical example using the ferrum gem (a pure-Ruby driver for Chrome):

## Gemfile
gem 'ferrum'

## scrape_dynamic.rb
require 'ferrum'

browser = Ferrum::Browser.new(timeout: 30)

begin
  browser.goto('https://example.com/prices')

#  # Wait for the dynamic table to load
  browser.at_css('table.prices') # raises if not found within timeout

  rows = browser.css('table.prices tbody tr')

  data = rows.map do |row|
    cells = row.css('td')
    {
      name:  cells[0].text.strip,
      sku:   cells[1].text.strip,
      price: cells[2].text.strip
    }
  end

  puts data
ensure
  browser.quit
end

This example of Ruby web scraping is heavier than the Nokogiri-only approach, but for single-page apps or dashboards behind login, a headless browser is often the only realistic option. Be prepared for higher CPU and memory usage, and consider running this kind of scraper on a schedule rather than continuously.

If you need full Selenium support (for example, to reuse existing QA infrastructure), the selenium-webdriver gem works similarly, but ferrum keeps the stack all‑Ruby.


Examples include scraping paginated data sets with polite rate limiting

Many real examples of Ruby web scraping examples – practical code snippets need to handle pagination and rate limiting. Let’s say you want to scrape multiple pages of a public directory.

require 'httparty'
require 'nokogiri'

BASE_URL = 'https://example.com/directory?page='
HEADERS  = { 'User-Agent' => 'DirectoryScraper/1.0 (contact: dev@example.com)' }

page      = 1
all_items = []

loop do
  url  = "#{BASE_URL}#{page}"
  resp = HTTParty.get(url, headers: HEADERS)

  break if resp.code == 404

  doc   = Nokogiri::HTML(resp.body)
  items = doc.css('.directory-item')

  break if items.empty?

  items.each do |item|
    all_items << {
      name:  item.at_css('.name')&.text&.strip,
      email: item.at_css('.email')&.text&.strip
    }
  end

  puts "Scraped page #{page}, #{items.size} items"
  page += 1

  sleep(rand(1.0..2.5)) # polite, jittered delay
end

puts "Total items: #{all_items.size}"

This example of Ruby web scraping shows a pattern you should reuse:

  • Stop when you hit a 404 or when the page returns no items.
  • Add jittered sleep to look less like a bot hammering the server.
  • Log progress so you can debug partial runs.

If you’re scraping public health or research sites, pay extra attention to politeness. For example, the CDC and NIH provide public data and APIs; scraping HTML should be a last resort after checking for official data feeds.


API-style scraping: extracting JSON from HTML requests

Sometimes the best examples of Ruby web scraping never touch the visible HTML at all. Many modern sites call JSON endpoints from the browser; your Ruby code can call those directly.

Use your browser’s Network tab to watch for fetch or XHR calls while you interact with the page. Once you find the URL, you can hit it from Ruby.

require 'httparty'
require 'json'

API_URL = 'https://example.com/api/v1/products?category=laptops'

response = HTTParty.get(API_URL, headers: {
  'User-Agent' => 'RubyApiScraper/1.0 (contact: dev@example.com)',
  'Accept'     => 'application/json'
})

data = JSON.parse(response.body)

products = data.fetch('products', []).map do |p|
  {
    id:       p['id'],
    name:     p['name'],
    price:    p['price'],
    currency: p['currency']
  }
end

puts products

This is arguably the cleanest example of Ruby web scraping you’ll see: you’re still automating data collection from the web, but you’re working with structured JSON rather than scraping HTML that might change every redesign.


If you’re writing new scrapers today, a few 2024–2025 trends matter more than they did a few years ago:

  • Heavier JavaScript: Single-page apps and infinite scroll are everywhere. Expect to use tools like Ferrum or Selenium more often.
  • Stricter anti-bot measures: CAPTCHAs, advanced bot detection, and IP reputation systems are more common. For many sites, the honest path is to request API access instead of fighting the protections.
  • More official APIs and data portals: Governments and research institutions increasingly publish open data. For example, the U.S. government’s data.gov portal offers machine-readable datasets that are far better than scraping HTML tables.
  • Legal scrutiny: High‑profile scraping cases in U.S. courts have made developers more cautious. Read your target site’s terms, understand fair use in your jurisdiction, and talk to counsel if you’re scraping at scale or redistributing data.

These trends don’t kill scraping, but they do change how you design it. The best examples of Ruby web scraping in 2025 are smaller, targeted, and often support internal analytics rather than public redistribution.


FAQ: common questions about examples of Ruby web scraping

Q: What are some real examples of Ruby web scraping used in production?
Real-world uses include monitoring competitor prices, aggregating news for internal dashboards, building custom job feeds, watching airline or hotel prices for internal revenue teams, and collecting public research data when no API exists.

Q: Can you give an example of handling CAPTCHAs in Ruby scrapers?
Most serious CAPTCHAs are designed to block automation. In practice, teams either avoid those sites, negotiate API access, or integrate third‑party CAPTCHA-solving services (which raises its own ethical and legal questions). As a rule of thumb, if a site invests in heavy CAPTCHA protection, treat that as a strong signal they don’t want automated scraping.

Q: How do I keep these examples of Ruby web scraping maintainable over time?
Keep selectors in one place, write small helper methods for fetching and parsing, and log failures when a selector returns nil. When a site changes its layout, you want one or two lines to update, not a scavenger hunt across multiple files.

Q: Is it legal to use these examples of Ruby web scraping against any public website?
“Publicly visible” and “free to scrape” are not the same thing. Laws vary by country, and court decisions evolve. Read the site’s terms, check robots.txt, and talk to a lawyer if scraping is core to your business. For U.S. context, the FTC’s business guidance pages are a good starting point, and many law schools publish readable analyses of scraping cases.

Q: When should I prefer an API over HTML scraping?
Almost always. If a site offers an official API or a documented data portal, use that. You’ll get cleaner data, clearer rate limits, and fewer legal headaches. Use HTML scraping as a last resort when no structured option exists.


The examples of Ruby web scraping examples – practical code snippets above won’t cover every edge case you’ll hit in the wild, but they do map out the core patterns: static HTML scraping, multi‑site aggregation, pagination, dynamic pages, and JSON endpoints. Once you’re comfortable with these, adapting them to your own use cases becomes a matter of tweaking selectors, headers, and output formats rather than reinventing the entire stack each time.

Explore More Ruby Code Snippets

Discover more examples and insights in this category.

View All Ruby Code Snippets