Basic Operations in Elasticsearch

2021-01-30

What is Elasticsearch ?

Elasticsearch is an open-source, highly scalable, full-text search and analytics engine. Its major advantage is that being a NoSQL database, it is very quick in data retrieval in comparison to the traditional relational databases. In this tutorial, we will learn some basic operations in Elasticsearch but let us go through some basic concepts first :

Index : Index is a collection of similar kind of documents which we want to store. Name of index can only be in lowercase. For example – let us say we have data of employees of a company. We can create an index named “employees” to store that data.

Type : [DEPRECATED] Index can be further sub-categorised into types. Generally there is not much advantage of having many types inside an index. This feature is deprecated in current versions and will be removed in Elasticsearch 8.0.0. However, we should know about it to work with older versions of Elasticsearch. In our example, we will create a single type “developers” in our “employees” index.

Document : It is the basic unit of data that we want to index (save). It is expressed in JSON format. In our example, we will index documents containing name, email-id and age of employees in “employees” index.

Now let us learn the basic operations we can perform in elastic search:

  • Indexing
  • Searching
  • Updating

Indexing

Create index

Following PUT api creates “employees” index with a “developers” type. It has name, email-id and age as its properties. The field datatypes used in our example are text and long but there are many more datatypes available like boolean, date, integer, short.

PUT employees
{
    "mappings" : {
        "developers" : {
            "properties" : {
                "name" : { "type" : "text" },
                "email-id" : { "type" : "text" },
                "age" : { "type" : "long" }
            }
        }
    }
}

Put data into index

Following PUT api inserts a JSON document into “employees” index under “developers” type with an id of 1. If we do not specify the id, an id is generated automatically.

PUT employees/developers/1
{
    "name" : "Rajat",
    "email-id" : "rajat@gmail.com",
    "age" : "26"
}

When a document is being inserted, its fields are analyzed i.e. several operations are performed on fields before saving, depending upon which analyzer is used. We can create and apply custom analyzers or choose from some built-in analyzers. If no analyzer is specified, then “Standard Analyzer” is applied by default. This removes most punctuations, lowercases the terms and performs several other operations. Hence “name” field in this document will be stored as “rajat” not “Rajat”.

Searching

We have learned how to store data in index. Let us now learn how to retrieve it.

Retrieve all the documents from index

Following POST api retrieves all documents stored in “employees” index under “developers” type.

POST employees/developers/_search
{
    "query":{
                "match_all":{}
                 }
}

Search with analyzed input

POST employees/developers/_search
{
    "query":{
               "match":{
                          "name":"RAJAT"
                             }
                   }
}

If we use “match” in search query then the search query input is analyzed before performing the search in documents (just as fields are analyzed during insert operation). By default search analyzer is “Standard Analyzer”. We can specify different analyzers for search and insert operations.

Above POST api will fetch the document we just indexed because the input of query becomes “rajat” after getting analyzed, which matches with the name field of the indexed document (which is also saved as “rajat” after getting analyzed).

Search with non-analyzed input

POST employees/developers/_search
{
    "query":{
               "term":{
                        "name":"RAJAT"
                             }
                 }
}

If we use “term” instead of “match”, then input of query is not analyzed.

Above POST api will not fetch any document because input of query is not analyzed and it remains as “RAJAT”. It does not match with the name field of the indexed document saved as “rajat”.

NOTE - Only search query is non-analyzed in this case. Indexed document’s fields are analyzed in both cases.

Boolean query with AND condition

POST employees/developers/_search
{
    "query":{
         "bool":{
                   "must":[
                           {  "match":{"name":"rajat"}},
                           {  "match":{"age":"35"}}
                               ]
                    }
                 }
}

It will not fetch any document because both conditions must be true for fetching a document.

Boolean query with OR condition

POST employees/developers/_search
{
    "query":{
           "bool":{
                 "should":[
                          {  "match":{"name":"nihal"}},
                          {  "match":{"age":"35"}}
                                ]
                      }
                  }
}

It will fetch the indexed document because only one true condition is sufficient for fetching a document.

Updating

Now let us learn how to modify index and its documents.

Add a new property in index

Following PUT api adds a new property “phone” in addition to the already existing properties to “employees” index under “developers” type.

PUT employees/_mapping/developers
{

     "properties": {
                "phone": {
                           "type": "long"
                                 }
                         }
}

Update a particular field of a document

POST employees/developers/_update_by_query
{
     "query":{
            "match":{
                    "name":"rajat"}
                          },
             "script":{
                     "source":"ctx._source.age=''30",
                     "lang":"painless"}
  }

It will update the age for Rajat to 30.

Deleting an index

DELETE employees

This will remove the “employees” index along with all its documents.