Real-time processing using Data Grids
What are Data Grids
The simplest approach to ensuring that everyone knows where to store their data is storing all the data in a single location (on a single server). But is there a better way? In many cases, the above approach may be sufficient, but there are more complex cases demanding a more sophisticated solution.
Data Grid is an architecture, or a system of services which guarantee that the user can access their data and make changes to it, transfer large quantities of data or access data from a geographically remote location. All of that based on authentication and authorization that is transparent to the end user. Data from many administrative domains (portions of the Data Grid) are presented to the clients in response to their real-rime requests.
Why use a Data Grid
- Low cost - open source framework using consumer-grade hardware - no hidden expenses included.
- Stability - cloud deployment ensures a stable environment for your system. Your data and their processing are both protected from hardware failure. Should one node fail, all of its workload will be routed to the remaining nodes.
- Scalability - thanks to the used technologies, it is possible to keep the system running smoothly regardless of the load. It is also simple to expand the system - start with a modest setup, go big later by adding nodes.
- Computational performance - the Data Grid distributed computation model is able to quickly process even large quantities of data (add more nodes for better performance).
- Flexibility - there is no need to preprocess data as in a traditional relational database setup. Just store as much data as you want, leave the processing decisions for later.
That's not all. High uptime, the capability to store data in the cloud and the possibility of working with complex data structures are but a few of the Data Grid advantages. You have the option of using only a portion of the grid for data storage, while the rest of it undergoes an upgrade or an expansion. All without any outage or slow-down of the grid. A Data Grid may also be used as a cache for data in a conventional database, thus improving its performance. Consider a Data Grid for its performance, or for its capabilities as a data store.
Our solution
Every Data Grid project is unique. Our experience with implementation of this type of systems allow us to focus on technologies which bring the greatest added value to the customer. That is not just the reduction of expenses, but also future development of the client's software. The components we use are highly dependable, scale well as the system utilization changes and provide high data availability. We typically use the following technologies (often in combination with Big Data):
- Infinispan - distributed cache and a NoSQL data store, ideal as a data layer for Enterprise applications. It supports, for example, global message delivery across the Data Grid, and it integrates well with Java applications.
- Elasticsearch - Ideal tool for searching through data, specifically, real-time full-text search. Elasticsearch excells at storing application logs, which you can then analyze and search through. It also supports fuzzy search and much more.
- OrientDB - second generation distributed graph database, its main advantages are the ability to maintain relationships between data within the grid and support for relations using “constraints“ - similar to a traditional relational database.