7 Reasons Why Open-Source Elassandra (Cassandra + Elasticsearch) Is Worth a Look
For organizations that rely on the Cassandra NoSQL database but require more efficient search capabilities, Elassandra offers a compelling open-source solution. Elassandra combines the powers of Elasticsearch and Cassandra by utilizing Elasticsearch as a Cassandra secondary index. While companies may use both Elasticsearch and Cassandra on their own (and unite them by developing their own custom integration or synchronization code), Elassandra negates the challenges of implementing these measures and managing that software separately.
By closely integrating Elasticsearch with Cassandra, Elassandra provides search latencies that approach real-time responsiveness. Better yet, it achieves this while also delivering access to all the advantages of Elasticsearch’s established ecosystem of REST APIs, plugins, and other solutions. Through these tools — such as the powerful Kibana UI that allows users to search, analyze, and visualize data quickly and easily — database ops can be carried out with much more efficiency than is possible using Cassandra and Elasticsearch independently.
Some specific advantages of Elassandra include the following.
1. Masterless Elasticsearch Node Architecture and No Need to Learn CQ
As Elasticsearch is designed, a master node manages and broadcasts the cluster state, while primary nodes handle write operations and replica nodes handle read operations. Elassandra alters this by embedding Elasticsearch within Cassandra nodes so that documents are stored as rows in Cassandra tables and the secondary index in Cassandra is updated synchronously with Elasticsearch indices every time a write happens to a Cassandra table. By doing so, there is no disparity.
Importantly, in this regard, Elassandra provides bi-directional mapping; inserting a document using Elasticsearch APIs will automatically create or update the CQL schema necessary to communicate with Cassandra, and Elasticsearch mapping can also be automatically discovered from existing CQL schema. In this way, Elassandra preserves the dynamic mapping of Elasticsearch while eliminating any need for developers to learn CQL in order to leverage Elassandra. This replication and synchronicity in how mapping is stored effectively makes Elasticsearch masterless, offering superior consistency in the aftermath of a node failure.
2. Nested Documents
Elassandra also allows devs to make highly effective use of nested documents, storing them in a Cassandra User Defined Type that is generated dynamically based on Elasticsearch mapping. The recursive nature of this feature allows for many layers of nested documents, providing a bigger value for businesses with this need.
3. Multiple Index Mappings and No Downtime
Using Elassandra, it’s possible to index the same keyspace in many different Elasticsearch indices with a variety of mappings. The advantage here is in being able to change and introduce new index mappings without any downtime.
4. Powerful Log Analysis
With the help of Kibana as a search and data visualization interface, Elassandra allows for the use of a partition function to enable analysis of logs. This is helpful for easily storing and visually charting web application logs — especially in situations where logs must be carefully maintained and accessible for auditing or compliance purposes.
5. Manage Time Series
Elassandra is valuable for managing time series, as well. By storing time series in Cassandra and searching only Elasticsearch index metric names and metadata for those time series, it’s possible to enrich that metadata with other data sources like data centers, applications, etc. This enables developers to gain insights (e.g. visualizing all machines running a certain application) and to apply this knowledge to diagnosis or strategic planning.
6. Scalability
Elassandra is also very easily scalable. If it becomes necessary to increase write throughput, Elassandra’s automatic resharding capabilities allow new nodes to be bootstrapped as needed. And because Elasticsearch functionality is embedded within Cassandra, Elassandra ensures it has the same high availability as the database.
7. Performance
Benchmarking the performance of write throughput on Elassandra versus Cassandra with Elasticsearch separately deployed, you’ll find that for most use cases, write throughput is about equal on both in situations where nodes are not overloaded. However, Elassandra uses only half the CPU power.
Conclusion
Given these factors, Elassandra boasts a solution that is better than the sum of its parts. By offering greater reliability and efficiency than its separate components, Elassandra presents an efficient new opportunity for organizations positioned to reap its benefits.
Ben Slater is Chief Product Officer at Instaclustr, which provides a managed service platform of open source technologies such as Apache Cassandra, Apache Spark, Elasticsearch and Apache Kafka.