A bit of history. I first learned about search and indexing in 2012. We were using Lucene for indexing content (specifically journals) for two CMS projects for a Fortune 500 scientific publisher.

I started using Elasticsearch seriously a few years back. Prior to that, I was playing with it casually. I was building a mini search “engine” in my startup, where we need to index and search a large volume of high-dimensional and high-resolution (think 10K pixels) of medical images and its meta-data in soft real-time. It’s one of the two core feature in our product offering where we strive a quick turn-around time from diagnosis to physician (doctor) getting back the results to the patients. Back then, we were out to modernized the radiology era that was stuck in non-digital practices. I knew it’s a hard challenge. Although my startup was defunct now, I’m proud of my achievement.

Loosely speaking, I think there’s always more to learn about Elasticsearch (actually applies to every technogies out there). In my previous works, I know I can barely get away by learning just enough and start solving problem. Ideally, I tell myself that I can always learn on the job. But in reality, at least in my personal experiences so far, that never happened.

Since I left my full-time role, I took the chance to properly study things that I promised myself to revisit one day. And here I’m, learning Elasticsearch! ^_^

I love books. It’s underrated. So, I choosed Elasticsearch in Action for the start. I hope I can get to an effective level by studying and practicing along this book. :)

I’m a few chapters in now. This is a nice and well writen book. I think it has a good balance of theory and practical examples. I would recommend you to pick this book if you don’t yet understand Elasticsearch basic terminology such as index, document, mapping, and query DSL. It will be part of my reference material.

Aside from that, I also just started learning Apache Kafka through the Effective Kafka book. I’m quite a Kafka newbie. Kafka and Elasticsearch combo, why not? Actually, I’m experimenting a learning technique - I want to understand connected learning strategy (in similar fashion like spaced learning method).

Just to share, I’ve written some quick notes below on Elasticsearch for my future self :P


This is from Part 1 “Core Elasticsearch Functionality”.

Chapter 2: Diving into the functionality#

How Elasticsearch data is organized#

To understand how data is organized in Elasticsearch, we’ll look at it from two angles:

  • Logical layout — What your search application needs to be aware of. The unit you’ll use for indexing and searching is a document, and you can think of it like a row in a relational database. Documents are grouped into types, which contain documents in a way similar to how tables contain rows. Finally, one or multiple types live in an index, the biggest container, similar to a database in the SQL world.
  • Physical layout — How Elasticsearch handles your data in the background. Elasticsearch divides each index into shards, which can migrate between servers that make up a cluster. Typically, applications don’t care about this because they work with Elasticsearch in much the same way, whether it’s one or more servers. But when you’re administering the cluster, you care because the way you configure the physical layout determines its performance, scalability, and availability.

Figure 1 llustrates the two perspectives.

Figure 1: An Elasticsearch cluster from the application's and administrator's points of view

Understanding the logical layout: documents, types, and indices#

When you index a document in Elasticsearch, you put it in a type within an index. You can see this idea in figure 2, where the get-together index contains two types: event and group. Those types contain documents, such as the one labeled 1. The label 1 is that document’s ID.

Figure 2 Logical layout of data in Elasticsearch: how an application sees data

Learning Resources#