<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Introspection &#187; Lucene</title>
	<atom:link href="http://blog.jeffhaynie.us/category/lucene/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.jeffhaynie.us</link>
	<description>Jeff Haynie on business and technology in Silicon Valley</description>
	<lastBuildDate>Fri, 14 Jan 2011 18:39:32 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Why I love Ruby #1</title>
		<link>http://blog.jeffhaynie.us/why-i-love-ruby.html</link>
		<comments>http://blog.jeffhaynie.us/why-i-love-ruby.html#comments</comments>
		<pubDate>Sat, 12 Aug 2006 13:56:16 +0000</pubDate>
		<dc:creator>Jeff Haynie</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://blog.jeffhaynie.us/?p=29</guid>
		<description><![CDATA[I&#8217;m going to start a new set of posts titled &#8220;Why I love Ruby&#8221;&#8230; I&#8217;ve been using Ruby for some time now and have built a number of applications using it.  Everyday, it just amazes me.  Partially the language itself. Partially the community support.
Jared and I have been doing a lot of Lucene [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>I&#8217;m going to start a new set of posts titled &#8220;Why I love Ruby&#8221;&#8230; I&#8217;ve been using Ruby for some time now and have built a number of applications using it.  Everyday, it just amazes me.  Partially the language itself. Partially the community support.</p>
<p>Jared and I have been doing a lot of <a href="http://lucene.apache.org/java/docs/">Lucene</a> and <a href="http://lucene.apache.org/nutch/">Nutch</a> work lately as we build out our new search-based website.  Most of our current back end, with a few exceptions, is built out in Java.  However, we&#8217;ve been doing more and more Ruby lately.</p>
<p>One of our pieces interacts with remote content in Ruby and we wanted to be able to index that content into one of our Lucene indexes, which is currently only really done in Java.  A quick search for &#8220;Ruby Lucene&#8221; revealed <a href="http://ferret.davebalmain.com/trac">Ferret</a>. How I love Ruby #1.</p>
<p>OK, so I whipped open my trustly console and typed:</p>
<pre style='background-color:#ffffcc;border:1px solid brown;border-left:4px solid brown;border-right:4px solid brown;'>
&gt; gem install ferret
</pre>
<p>Of course, seconds later, ferret was installed.</p>
<p>Now, I created a quick index and search demo:</p>
<pre style='background-color:#ffffcc;border:1px solid brown;border-left:4px solid brown;border-right:4px solid brown;'>
require &apos;net/http&apos;
require &apos;rubygems&apos;
require &apos;ferret&apos;

include Ferret
include Ferret::Document

# create an index in memory, pass path to make a persistent index
#index = Index::Index.new(:path=&gt;&apos;lucene&apos;,:create=&gt;true)
index = Index::Index.new()

# allow command line index/search
site = ARGV[0] || &apos;http://www.cnn.com/&apos;
query = ARGV[1] || &apos;iraq&apos;

puts &quot;Indexing: #{site}&quot;

# fetch the content (note in this simple example, redirections aren&apos;t followed)
url = URI.parse(site)
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http|
   http.request(req)
}

# create a new doc index and store some data
doc = Document.new
doc &lt; &lt; Field.new(&apos;uri&apos;,url.to_s,Field::Store::YES,Field::Index::UNTOKENIZED)
doc &lt;&lt; Field.new(&apos;content&apos;,res.body,Field::Store::YES,Field::Index::TOKENIZED)
index &lt;&lt; doc

puts &quot;Index document count: #{index.size}, now searching: #{query}&quot;

# construct a query and now search our index
q = &quot;content:\&quot;#{query}\&quot;&quot;
found = 0
index.search_each(q) do |doc_num,score|
   doc = index[doc_num]
   uri = doc.field(&apos;uri&apos;).data
   found+=1
   puts &quot;Document id: #{doc_num} found with a score of #{score} at #{uri}&quot;
end

if found==0
  puts &apos;Search query did not produce any matches&apos;
else
  puts &quot;Search found #{found} documents&quot;
end
</pre>
<p>Viola! Now, execute:</p>
<pre style='background-color:#ffffcc;border:1px solid brown;border-left:4px solid brown;border-right:4px solid brown;'>
mycomputer:~ jhaynie$ ruby lucene.rb http://news.com.com/ microsoft
Indexing: http://news.com.com/
Index document count: 1, now searching: microsoft
Document id: 0 found with a score of 0.0139269828796387 at http://news.com.com/
Search found 1 documents
</pre>
<p>I love Ruby.</p>
<hr />
<p><b>Technorati Tags:</b> <a href="http://www.technorati.com/tags/lucene" rel="tag">lucene</a>, <a href="http://www.technorati.com/tags/nutch" rel="tag">nutch</a>, <a href="http://www.technorati.com/tags/ruby" rel="tag">ruby</a>, <a href="http://www.technorati.com/tags/ferret" rel="tag">ferret</a></p>
<img src="http://blog.jeffhaynie.us/?ak_action=api_record_view&id=29&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://blog.jeffhaynie.us/why-i-love-ruby.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

