How it works...
One of the most used APIs in Elasticsearch is the index. Basically, indexing a JSON document consists internally of the following steps:
- Routing the call to the correct shard based on the ID, or routing, or parent metadata. If the ID is not supplied by the client, a new one is created (see the Managing your data recipe in Chapter 1, Getting Started, for details).
- Validating the sent JSON.
- Processing the JSON according to the mapping. If new fields are present in the document (and the mapping can be updated), new fields are added in the mapping.
- Indexing the document in the shard. If the ID already exists, it is updated.
- If it contains nested documents, it extracts them, and it processes them separately.
- Returning information about the saved document (ID and versioning).
It's important to choose the correct ID for indexing your data. If you don't provide an ID, during the indexing phase, Elasticsearch will automatically associate a new one to your document. To improve performance, the ID should generally be of the same character length to improve the balancing of the data tree that stores them.
Due to the REST call nature, it's better to pay attention when not using ASCII characters due to URL encoding and decoding (or be sure that the client framework you use correctly escapes them).
Depending on the mappings, other actions take place during the indexing phase: propagation on replica, nested processing, and percolator.
The document will be available for standard search calls after a refresh (forced with an API call or after the time slice of 1 second, near real-time): not every GET API on the document requires a refresh, and these can be instantly available.
The refresh can also be forced by specifying the refresh parameter during indexing.