Why I love Ruby #1

by Jeff Haynie on August 12, 2006

I’m going to start a new set of posts titled “Why I love Ruby”… I’ve been using Ruby for some time now and have built a number of applications using it. Everyday, it just amazes me. Partially the language itself. Partially the community support.

Jared and I have been doing a lot of Lucene and Nutch work lately as we build out our new search-based website. Most of our current back end, with a few exceptions, is built out in Java. However, we’ve been doing more and more Ruby lately.

One of our pieces interacts with remote content in Ruby and we wanted to be able to index that content into one of our Lucene indexes, which is currently only really done in Java. A quick search for “Ruby Lucene” revealed Ferret. How I love Ruby #1.

OK, so I whipped open my trustly console and typed:

> gem install ferret

Of course, seconds later, ferret was installed.

Now, I created a quick index and search demo:

require 'net/http'
require 'rubygems'
require 'ferret'

include Ferret
include Ferret::Document

# create an index in memory, pass path to make a persistent index
#index = Index::Index.new(:path=>'lucene',:create=>true)
index = Index::Index.new()

# allow command line index/search
site = ARGV[0] || 'http://www.cnn.com/'
query = ARGV[1] || 'iraq'

puts "Indexing: #{site}"

# fetch the content (note in this simple example, redirections aren't followed)
url = URI.parse(site)
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http|
   http.request(req)
}

# create a new doc index and store some data
doc = Document.new
doc < < Field.new('uri',url.to_s,Field::Store::YES,Field::Index::UNTOKENIZED)
doc << Field.new('content',res.body,Field::Store::YES,Field::Index::TOKENIZED)
index << doc

puts "Index document count: #{index.size}, now searching: #{query}"

# construct a query and now search our index
q = "content:\"#{query}\""
found = 0
index.search_each(q) do |doc_num,score|
   doc = index[doc_num]
   uri = doc.field('uri').data
   found+=1
   puts "Document id: #{doc_num} found with a score of #{score} at #{uri}"
end

if found==0
  puts 'Search query did not produce any matches'
else
  puts "Search found #{found} documents"
end

Viola! Now, execute:

mycomputer:~ jhaynie$ ruby lucene.rb http://news.com.com/ microsoft
Indexing: http://news.com.com/
Index document count: 1, now searching: microsoft
Document id: 0 found with a score of 0.0139269828796387 at http://news.com.com/
Search found 1 documents

I love Ruby.


Technorati Tags: , , ,

Popularity: 12% [?]

If you enjoyed this post, make sure you subscribe to my RSS feed!

blog comments powered by Disqus

Previous post: Eicon to buy Intel Dialogic Unit; Microsoft to stop shipping MSS standalone

Next post: BarCamp Atlanta