(904) 580-4877

MongoDB, one of the leading NoSQL databases,
is well known for its fast performance, flexible schema, scalability and
great indexing capabilities. At the core of this fast performance lies MongoDB
indexes, which support efficient execution of queries by avoiding full-collection
scans and hence limiting the number of documents MongoDB searches. 

Starting from
version 2.4, MongoDB began with an experimental feature supporting Full-Text Search using Text Indexes. This feature has now
become an integral part of the product (and is no longer an experimental
feature). In this article we are going to explore the full-text search
functionalities of MongoDB right from fundamentals.

If you are new to MongoDB, I recommend that you read the
following articles on Envato Tuts+ that will help you understand the basic concepts
of MongoDB:

The Basics

Before we get into any details, let us look at some background.
Full-text search refers to the technique of searching a full-text database against the search criteria specified by the user.
It is something similar to how we search any content on Google (or in fact any
other search application) by entering certain string keywords/phrases and
getting back the relevant results sorted by their ranking.

Here
are some more scenarios where we would see a full-text search happening:

Before we move on, there are certain general terms related
to full-text search which you should know. These terms are applicable to any
full-text search implementation (and not MongoDB-specific).

Stop Words

Stop words are the irrelevant words that should be filtered
out from a text. For example: a, an, the, is, at, which, etc.

Stemming

Stemming is the process of reducing the words to their stem.
For example: words like standing, stands, stood, etc. have a common base stand.

Scoring

A relative ranking to measure which of the search results is
most relevant.  

Alternatives to
Full-Text Search in MongoDB

Before MongoDB came up with the concept of text indexes, we
would either model our data to support keyword searches or use regular expressions for implementing such search
functionalities. However, using any of these approaches had its own limitations:

Apart from these approaches, for more advanced and complex
search-centric applications, there are alternative solutions like Elastic
Search
or SOLR. But
using any of these solutions increases the architectural complexity of the
application, since MongoDB now has to talk to an additional external database. 

Note
that MongoDB’s full-text search is not proposed as a complete replacement of search
engine databases like Elastic, SOLR, etc. However, it can be effectively used
for the majority of applications that are built with MongoDB today.

Introducing MongoDB
Text Search

Using MongoDB full-text search, you can define a text index
on any field in the document whose value is a string or an array of strings. When we create a text
index on a field, MongoDB tokenizes and stems the indexed field’s text content,
and sets up the indexes accordingly.  

To understand things further, let us now dive into some practical
things. I want you to follow the tutorial with me by trying out the
examples in mongo shell. We will first create some sample data which we will be
using throughout the article, and then we’ll move on to discuss key concepts.

For the purpose of this article, consider a collection messages which stores documents of the
following structure: 

Let us insert some sample documents using the insert command to create our test data:

Creating a Text Index

A text index is created quite similar to how we create a
regular index, except that it specifies the text
keyword instead of specifying an ascending/descending order.

Indexing a Single Field

Create a text index on the subject
field of our document using the following query:

To test this newly created text index on the subject field, we will search documents using the $text operator. We will be looking for
all the documents that have the keyword dogs
in their subject field. 

Since we
are running a text search, we are also interested in getting some
statistics about how relevant the resultant documents are. For this purpose, we
will use the { $meta:
"textScore" }
expression, which provides information on the processing
of the $text operator. We will also sort
the documents by their textScore using the sort command. A higher textScore indicates a more relevant
match. 

The above query returns the following documents containing
the keyword dogs in their subject field. 

As you can see, the first document has a score of 1 (since
the keyword dog appears twice in its subject) as opposed to the second document
with a score of 0.66. The query has also sorted the returned documents in descending
order of their score.

One question that might
arise in your mind is that if we are searching for the keyword dogs, why is the search engine is taking
the keyword dog (without ‘s’) into
consideration? Remember our discussion on stemming, where any search keywords
are reduced to their base? This is the reason why the keyword dogs is reduced to dog.

Indexing Multiple
Fields (Compound Indexing)

More often than not, you will be using text search on
multiple fields of a document. In our example, we will enable compound text
indexing on the subject and content fields. Go ahead and execute
the following command in mongo shell:  

Did this work? No!! Creating a second text index will give
you an error message saying that a full-text search index already exists. Why is it so? The answer is that text
indexes come with a limitation of only one text index per collection. Hence if
you would like to create another text index, you will have to drop the existing
one and recreate the new one. 

After executing the above index creation queries, try
searching for all documents with keyword cat.

The above query would output the following documents:

You can see that the score of the first document, which contains
the keyword cat in both subject
and content fields, is higher. 

Indexing the Entire
Document (Wildcard Indexing)

In the last example, we put a combined index on the subject and content fields. But there can be scenarios where you want any text
content in your documents to be searchable. 

For example, consider storing
emails in MongoDB documents. In the case of emails, all the fields, including
Sender, Recipient, Subject and Body, need to be searchable. In such scenarios you
can index all the string fields of your document using the $** wildcard specifier.

The query would go something like this (make sure you are
deleting the existing index before creating a new one):

This query would automatically set up text indexes on any
string fields in our documents. To test this out, insert a new document with a
new field location in it:

Now if you try text searching with keyword chicago (query below), it will return
the document which we just inserted.

A few things I would like to focus on here:

Advanced Searching

Phrase Search

You can search for phrases like “smart birds who love cooking” using text indexes. By default, the
phrase search makes an OR search on
all the specified keywords, i.e. it will look for documents which contains
either the keywords smart, bird, love or cook.

This query would output the following documents:

In case you would like to perform an exact phrase search
(logical AND), you can do so by
specifying double quotes in the search text. 

This query would result in the following document, which
contains the phrase “cook food” together:

Negation Search

Prefixing a search keyword with (minus sign) excludes all the documents that contain the negated
term. For example, try searching for any document which contains the
keyword rat but does not contain birds using the following query:

Looking Behind the Scenes

One important functionality I did not disclose till now is
how you look behind the scenes and see how your search keywords are being stemmed,
stop wording applied, negated, etc. $explain
to the rescue. You can run the explain query by passing true as its parameter, which will give you detailed stats on the
query execution.  

If
you look at the queryPlanner object
returned by the explain command, you will be able to see how MongoDB parsed the
given search string. Observe that it neglected stop words like who, and stemmed dogs to dog

You can also see the terms which we neglected from our search
and the phrases we used in the parsedTextQuery
section.  

The explain query will be highly useful as we perform more
complex search queries and want to analyze them.

Weighted Text Search

When we have indexes on more than one field in our document,
most of the times one field will be more important (i.e. more weight) than
the other. For example, when you are searching across a blog, the title of the
blog should be of highest weight, followed by the blog content.

The default weight for every indexed field is 1. To assign
relative weights for the indexed fields, you can include the weights option while using the createIndex
command.

Let’s understand this with an example. If you try searching for the cook keyword with our current
indexes, it will result in two documents, both of which have the same
score.   

Now let us modify our indexes to include weights; with the subject field having a weight of 3
against the content field having a weight
of 1.

Try searching for keyword cook now, and you will see that the document which contains this keyword
in the subject field has a greater score
(of 2) than the other (which has 0.66).

Partitioning Text
Indexes

As the data stored in your application grows, the size of your text indexes keeps on growing too. With this increase in size of text indexes,
MongoDB has to search against all the indexed entries whenever a text search is
made. 

As a technique to keep your text search efficient with growing indexes, you
can limit the number of scanned index entries by using equality conditions with a regular $text search. A very common
example of this would be searching all the posts made during a certain
year/month, or searching all the posts with a certain category/tag.

If you observe the documents which we are working upon, we
have a year field in them which we
have not used yet. A common scenario would be to search messages by year, along
with the full-text search that we have been learning about. 

For this, we can
create a compound index that specifies an ascending/descending index key on year followed by a text index on the subject field. By doing this, we are
doing two important things:

Drop the indexes that you already have and create a new
compound index on (year, subject):

Now execute the following query to search all the messages
that were created in 2015 and contain the cats keyword:

The query would return only one matched document as expected.
If you explain this query and look
at the executionStats, you will find
that totalDocsExamined for this
query was 1, which confirms that our new index got utilized correctly and
MongoDB had to only scan a single document while safely ignoring all other
documents which did not fall under 2015.

Text Indexes: Benefits

What More Can Text
Indexes Do?

We have come a long way in this article learning about text
indexes. There are many other concepts that you can experiment with text
indexes. But owing to the scope of this article, we will not be able to discuss
them in detail today. Nevertheless, let’s have a brief look at what these
functionalities are:

MongoDB Text Indexing
vs. External Search Databases

Keeping in mind the fact that MongoDB full-text search is not
a complete replacement for traditional search engine databases used with
MongoDB, using the native MongoDB functionality is recommended for the
following reasons:

Text Indexes: Drawbacks

Full-text search being a relatively new feature in MongoDB,
there are certain functionalities which it currently lacks. I would divide them into three categories. Let’s have a look.

Functionalities Missing
From Text Search

Restrictions in
Existing Functionalities

Performance Downsides

Wrapping Up 

Full-text search has always been one of the most demanded
features of MongoDB. In this article, we started with an introduction to what full-text search is, before moving on to the basics of creating text indexes. 

We then explored
compound indexing, wildcard indexing, phrase searches and negation searches. Further,
we explored some important concepts like analyzing text indexes, weighted
search, and logically partitioning your indexes. We can expect some major updates to this functionality in the
upcoming releases of MongoDB. 

I recommend that you give text-search a try and share your thoughts. If you have already implemented it in your application, kindly share your experience here. Finally, feel free to post your questions, thoughts and
suggestions on this article in the comment section.  


Source: Envato Tuts+ CodeNew feed