Elasticsearch For Dummies Part 2: Datatypes

Tim Estes
6 min readOct 14, 2022

--

In part one, we looked at the basics: an introduction to documents, indexes, and querying.

Today we’ll be delving a bit deeper into the world of Elastic by examining the different datatypes of Elastic and how they fit into the beautiful ecosystem of search.

Datatypes

When you index a document, each field in that document requires a datatype.

For example, a “string” is a data type that is typically used to classify text , while “integer” is a data type used to classify whole numbers. Data types are critically important to an index because they unlock powerful capabilities if used properly (and can cause great headaches if ignored).

If no data types are declared ahead of time, Elastic will make an educated guess about what data type we would want to use at the time of document creation. This is called dynamic mapping.

Let’s revisit our hello_world index and the document that we indexed earlier and check to see what kind of datatype dynamic mappings Elastic inferred for our index.

  1. If you haven’t already, create the index hello_world by running the following command against your cluster. I’m using VS Code to run the commands (see Part 1 for instructions on how to setup a local instance of Elastic and connect to it).
PUT hello_world

2. Add a document to the index if you haven’t already. You should see the following result:

3. Now run a GET _mappings command to view the assigned mappings of the hello_world index:

Take a look at the “type” field in the mapping properties and observe how each field in the document has been assigned a data type.

In this case, two types have been chosen for our different field: “text” and “long.” This makes sense because our data consists of numbers and letters, with the number fields being assigned “long” and the text field being assigned “text.”

Numeric Datatypes

Numeric datatypes are present in all databases. They allow for efficient and flexible storage options while allowing users to perform numeric operations on them.

For example, with these fields, we can run range queries. The following query searches for all documents with a speech_number greater than 5:

This next query lets us find all documents that have a speech_number greater than 5, but less than 10. Pretty straightforward so far, right?

And if you are wondering, yes we can perform addition with these fields. Later on in the series, we’ll learn how to run aggregations that calculate sums of fields from a large quantity of documents (which would be useful in a financial workspace, as you can imagine).

Text Datatypes

The text incredibly flexible and powerful. All text fields are passed through an analyzer to break up the string into individual pieces. This allows us to search for individual words in a sentence, for example.

The text field is best suited for human-readable strings, such as sentences and twitter tweets. If you are indexing email addresses or machine generated log messages, Elastic recommends using keyword or wildcard data types. For further reading, see the docs about deciding how to map unstructured content.

In our document, the text field is perfectly suitable for our text_entry and speaker fields! However, for the other string fields, I would use a keyword datatype. Keywords are best suited for structured content such as tags, static labels, and email addresses. Keywords are primarily used in aggregations, sorting and term queries.

Explicit Mappings

Let’s try our hand at adding some explicit mappings to our index. We can do this using the following commands:

  • First, delete the existing index by running DELETE hello_world.It’s hard to change mappings of an index after it’s already created, so for now we will just delete and recreate the index)
  • Then we will create the index and use a PUT hello_world/_mappings command to add our own mappings:
  • To confirm our new mappings, run the GET hello_world/_mappings command:
Beautiful!

Now when we index a document, it will be mapped explicitly instead of having guesswork involved.

Date Datatypes

Lastly, let’s take a look at another important type of data: time-based data. The date field type holds date and time values and are very helpful when dealing with time sensitive data, such as logs and chat messages.

Imagine you are in charge of indexing the works of Shakespeare into a database. But your boss wants you to attach a timestamp to each document marking the exact time the document was put into the index. How would you go about doing this?

One approach would be to use another programming language (like Python) to create a script that would bulk index the document and tag each doc a timestamp at index time. But what if there was an easier way to do this natively in Elastic?

Enter ingest pipelines. From docs:

Ingest pipelines let you perform common transformations on your data before indexing. For example, you can use pipelines to remove fields, extract values from text, and enrich your data.

In this case, we’ll want to enrich our data by adding a timestamp value to all docs. Any time we make an index request against our target index, the document will first be processed by the ingest pipeline:

To setup this ingest pipeline, we’ll need to run the following command:

This will add a processor that adds a new field called @timestamp to each document that contains a timestamp value.

Then we need to attach the pipeline to our index:

To test the pipeline, let’s add another document and then look at it:

Beautiful! It seems to be working perfectly.

Now for the finale, let’s run a date range query to find all documents that were indexed in the last hour:

Timestamp fields are some of the most useful fields practically speaking and come in clutch when trying to setup index lifecycle policies and data streams. Index lifecycle polices help users prevent indexes from growing too large and eating up infinite storage space over time. We shall touch upon this subject at a later time.

Well that’s enough information for one article! Thanks for making it this far. If you are enjoying the series, leave a like and a comment. The encouragement really helps with my motivation for writing these.

Also, if you have any topics that you’d want to see me cover in future articles, leave me a message!

--

--

Tim Estes

If I were 5 Jeopardy categories, they would be: Microservices, Magic: the Gathering, Microwaved Meals, The Old Testament, Elasticsearch, and League of Legends