Working with Elasticsearch for search and analytics

Elasticsearch is a distributed search and analytics engine that is used to index, search, and analyze large volumes of data quickly and in real-time. Elasticsearch is built on top of Apache Lucene, which is a high-performance indexing and search library. Elasticsearch provides a simple and powerful REST API that allows users to interact with their data through search queries, aggregations, and more.

In this tutorial, we will cover the basics of working with Elasticsearch and provide you with a step-by-step guide on how to set up Elasticsearch on your own machine, index data, and perform basic search and aggregation queries.

Prerequisites

Before getting started, you’ll need the following:

  • A basic understanding of REST APIs
  • A machine running Ubuntu 18.04 (or any other recent Linux distribution)
  • Java 8 or later installed on your computer
  • Elasticsearch installed on your machine

For the purpose of this tutorial, we’ll be installing Elasticsearch on Ubuntu 18.04.

Step 1: Install Elasticsearch

The first step to getting started with Elasticsearch is to install it on your machine. Here are the steps for installing Elasticsearch on Ubuntu 18.04:

1.1: Install Java 8 or Later

Elasticsearch is built on top of Java, so you’ll need to install Java 8 or later on your machine. Here’s how to install Java 8:

sudo apt-get update
sudo apt-get install openjdk-8-jdk

Once you’ve installed Java, you can verify the installation by running the following command:

java -version

1.2: Download and Install Elasticsearch

The next step is to download and install the Elasticsearch package that matches the version of Java you’ve installed. Here’s how to download and install the Elasticsearch package:

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.3-linux-x86_64.tar.gz
tar -xzf elasticsearch-7.9.3-linux-x86_64.tar.gz
cd elasticsearch-7.9.3
./bin/elasticsearch

This will start Elasticsearch on your machine.

Step 2: Index Data

Once Elasticsearch is installed, you can start indexing data. In Elasticsearch, data is stored in indices, which are similar to tables in a relational database. Here’s how to create an index and add data to it:

2.1: Create an Index Mapping

Before you can start indexing data, you’ll need to define a mapping for your index. A mapping defines the fields that your index will contain and their data types. Here’s an example mapping for a blog post index:

PUT /blog_post
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "content": {
        "type": "text"
      },
      "tags": {
        "type": "keyword"
      },
      "date": {
        "type": "date"
      }
    }
  }
}

This mapping defines four fields: title, content, tags, and date. The title and content fields are of type text, which means that they can contain full-text search data. The tags field is of type keyword, which means that it can be used for keyword-based search queries. The date field is of type date, which means that it can be used for date-based search queries.

2.2: Index Data

Now that you have a mapping for your index, you can start indexing data. Here’s how to index a blog post:

PUT /blog_post/_doc/1
{
  "title": "Getting started with Elasticsearch",
  "content": "Elasticsearch is a distributed search and analytics engine that is used to index, search, and analyze large volumes of data.",
  "tags": ["elasticsearch", "tutorial", "search"],
  "date": "2020-11-18"
}

This will create a new document in the blog_post index with an ID of 1.

Step 3: Search and Analyze Data

Now that you’ve indexed some data, you can start querying it. In Elasticsearch, search and analytics are performed using the REST API, which allows you to send search queries and aggregations to Elasticsearch.

3.1: Simple Search Query

Here’s an example of a simple search query that searches for blog posts that contain the word elasticsearch in the title or content:

GET /blog_post/_search
{
  "query": {
    "match": {
      "title": "elasticsearch"
    }
  }
}

This query uses the match query to search for the term elasticsearch in the title field. Elasticsearch will return any documents that contain the term elasticsearch in the title field.

3.2: Aggregations

Elasticsearch also supports aggregations, which allow you to summarize and analyze data. Here’s an example of a simple aggregation that counts the number of blog posts for each tag:

GET /blog_post/_search
{
  "aggs": {
    "tags": {
      "terms": {
        "field": "tags"
      }
    }
  }
}

This query uses the terms aggregation to group the documents by their tags field. Elasticsearch will return a list of all the unique tags in the tags field, along with the number of documents that have each tag.

Conclusion

In this tutorial, we’ve covered the basics of working with Elasticsearch, including how to install Elasticsearch, index data, and perform basic search and aggregation queries. Elasticsearch is a powerful tool for search and analytics, and it’s used by some of the world’s largest companies to index and analyze large volumes of data. With Elasticsearch, you can search, filter, and aggregate your data in real-time, making it an essential tool for any modern data-driven organization.

Related Post