Ferret

67 %
33 %
Information about Ferret

Published on February 6, 2008

Author: bsbodden

Source: slideshare.net

Description

Introduction to Ferret, the Ruby Full-Text Search Engine

Ferret A Ruby Search Engine Brian Sam-Bodden

Agenda • What is Ferret? • Concepts • Fields • Indexing • Installing Ferret

Agenda • The Recipe • Documents • Ferret::Index::Index • FQL • Ferret in you App

Agenda • Ferret in Rails • Resources

What is Ferret? • Information Retrieval (IR) Library • Full-featured Text Search Engine • Inspired on the Search Engine • Port to Ruby by David Balmain

What is Ferret? • Initially a 100% pure Ruby port • Since 0.9 many core functions are implemented in C • Fast! Now Faster than Lucene ;-)

Concepts

Concepts • Index : Sequence of documents

Concepts • Index : Sequence of documents • Document : Sequence of fields

Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms

Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms • Term : A text string, keyed by field name

Fields of a Document in an Index

Fields of a Document in an Index • Fields are individually searchable units that are:

Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store

Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms

Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed

Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed • Vectored: Frequency and location of Terms are stored

It’s all about Indexing • Indexing is the processing of a source document into plain text tokens that Ferret can manipulate • For any non-plaintext sources such as PDF, Word, Excel you need to: • Extract • Analyze

Installing Ferret

Installing Ferret gem install ferret

Installing Ferret

Installing Ferret

Installing Ferret }

Installing Ferret } Pick the latest version for your platform

The Recipe

The Recipe 1. Create some Documents

The Recipe 1. Create some Documents 2. Create an Index

The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index

The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index 4. Perform some Queries

Example Documents Create some Documents

Example Documents Create some Documents “Any String is a Document”

Example Documents Create some Documents

Example Documents Create some Documents [“This”, “is also”, “a document”]

Example Documents Create some Documents

Example Documents Create some Documents

Ferret::Index::Index Create an Index

Ferret::Index::Index Create an Index • Indexes are encapsulated by the class

Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index

Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience

Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent

Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’)

Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory

Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory ➡ index = Ferret::I.new()

Ferret::Index::Index Adding Documents to the Index • Index provides the add_document method • It also provides the << alias • Adding documents is then as easy as: ➡ index << “This is a document” ➡ index << {:first => “Bob”, :last => “Smith”}

Ferret::Index::Index Perform some Queries

Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods

Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters:

Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {})

Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block

Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block ➡ search_each(query, options = {}) {|doc, score| ... }

Playing with Ferret in irb

Playing with Ferret in irb

Playing with Ferret in irb

Playing with Ferret in irb

Playing with Ferret in irb

Playing with Ferret in irb

Playing with Ferret in irb

Playing with Ferret in irb

Ferret Query Language • Ferret own Query Language, FQL is a powerful way to specify search queries • FQL supports many query types, including: • Term • Range • Phrase • Wild • Field • Fuzz • Boolean

Index.explain • The explain method of Index describes how a document score against a query • Very useful for debugging • and for learning how Ferret works

Index.explain

Ferret in your App Application Database Web User Manual File System Input Get User’s Present Gather Data Search Results Query Index Documents Search Index Ferret Index

Ferret in Rails • Acts As Ferret is an ActiveRecord extension • Available as a plugin • Provides a simplified interface to Ferret • Maintained by Jens Kramer

Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:

Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:

Ferret in Rails • Simple model has two searchable fields title and body:

Ferret in Rails • After a quick rake db:migrate we now have some data to play with • Fire up the Rails Console and let’s see what acts_as_ferret can do for our models

Ferret in Rails

Want more? • Ferret is improving constantly • Acts As Ferret seems to catch up quickly • Real-life usage seems to require some good engineering on your part • Background indexing • Hot swap of indexes?

Want more? • We only covered the simplest constructs in Ferret • Ferret’s API provides enough flexibility for the most demanding searching needs

Online Resources • http://ferret.davebalmain.com • http://lucene.apache.org • http://lucenebook.com • http://projects.jkraemer.net/acts_as_ferret

In-Print Resources

Thanks!

Add a comment

Related presentations

Related pages

Ferret – Wikipedia

Ferret hat verschiedene Bedeutungen: ein Familienname Baro Ferret (1908–1978), französischer Musiker; Jacques Ferret (1901–1992), französischer Philologe
Read more

Ferret - Wikipedia, the free encyclopedia

The ferret (Mustela putorius furo) is the domesticated form of the European polecat, a mammal belonging to the same genus as the weasel, Mustela of the ...
Read more

dict.cc | ferret | Wörterbuch Englisch-Deutsch

Übersetzung für ferret im Englisch-Deutsch-Wörterbuch dict.cc.
Read more

Ferret (Radpanzer) – Wikipedia

Der Ferret, auch bekannt als Ferret Scout Car, ist ein kleiner britischer Radpanzer, der als leicht gepanzertes Aufklärungs- und Verbindungsfahrzeug ...
Read more

Ferret Information, Ferret Care, Ferret Health

Ferret Behavior. People have probably been trying to figure out ferret behavior since ferrets began being kept as pets a couple thousand years ago.
Read more

Ferret Supplies | Ferret Accessories - Ferret.com

Ferret.com carries a huge selection of top quality ferret supplies at low prices. Shop our selection of ferret accessories & products today!
Read more

Ferret | Definition of Ferret by Merriam-Webster

The ferret is a domesticated breed of the European polecat. For centuries the ferret has been used in Europe for hunting rats and sometimes rabbits.
Read more

Ferret | Define Ferret at Dictionary.com

Ferret definition, a domesticated, usually red-eyed, and albinic variety of the polecat, used in Europe for driving rabbits and rats from their burrows.
Read more

American Ferret Association: Frequently Asked Questions

American Ferret Association, Inc. Home Page; Promote, Protect, & Provide for the domestic ferret
Read more

Ferret - definition of ferret by The Free Dictionary

Yes--yes--the end is not so difficult; if I had only a brain active enough to ferret out the means of attaining it.
Read more