A Novel Framework for Querying Multiparadigm Databases
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Relational databases evolved in accordance to the prevailing technological requirements
and constraints that had suitability, applicability and relevance at that time.
However, times have changed and also the contemporary requirements. To alleviate
the problems associated with relational databases in handling present big data,
which is predominantly un-structured, a new class of databases has emerged, known
as NoSQL databases. As and when a new class of data-storage emerges and becomes
popular, researchers start working towards its integration with the existing
databases. Likewise, with the widespread use of NoSQL data-stores, the problem
of integrating them with existing database technology has became a challenge. The
goal is to select the most appropriate data storage technology that meets the specific
requirements of each module of the application.
Amalgamation of different databases within an application is known as Multiparadigm
approach or Polyglot-persistence. Persistence needs of applications are
progressing from mostly relational to a mixture of data-stores. For example various
modules of Health-care Information System (HIS) use different data-stores to model
data closer to their semantic usage. The researcher has showcased the applicability
of our multipradigm framework in HIS, considering the variety of data and diverse
categories of NoSQL data-stores with which they may be managed. But the concept
is equally well applicable to any other application area, where different parts of the
application deals with distinct data formats.
The researcher has implemented a healthcare information system – PolyglotHIS,
which makes use of one relational and two NoSQL data-stores. This coalition of datastores
is not arbitrary, instead it is prudently chosen on the basis of careful analysis
of alternative data-stores. Each involved data-store has its own specific advantage.
Relational data-stores are preferred for data pertaining to financial transactions,
since they support transactional properties. Employees’ payroll data, patients’ billing
xii
information and financial component of pharmacy department are handled by the
relational database – PostgreSQL. NoSQL data-stores supports BASE (Basically
Available Soft-state, Eventually-consistent) properties, which is the opposite of ACID
and therefore not suitable for transactions.
Other two NoSQL data-stores used in PolyglotHIS are: MongoDB and Neo4j.
MongoDB, the most widely-used document-database today, is schema-less and best
suited for storing unstructured or semi-structured data, such as laboratory reports,
laboratory images, instrument manuals, photos of the patients and doctors, etc.
The data containing in-built relationships, such as blood relation between various
patients thereby helping doctors to trace presence of any hereditary disease(s) by
examining these relationships, are stored in graph database (i.e. Neo4j). Interlinking
between symptoms has also been stored in Neo4j graph database to assist
the doctor in visualizing the links between symptoms and the disease, leading to
quicker diagnosis.
Integration of multiple data-stores is facilitated through usage of multiple cooperative
agents, which make up mediation layer of the system. Provision of a framework
to represent knowledge about schemas of constituent data-stores in a unified
representation scheme is achieved with the help of Datalog facts and rules. Datalog,
which is a declarative logic programming language, is used to store sets of facts and
rules, helps in the storing and inferring about the capabilities of data-stores used in
the PolyglotHIS. Homogenization of results obtained from heterogeneous data-stores
has been made possible due to the support for JSON format within all the involved
data-stores.
This proposed approach is novel in the way various data-stores are integrated,
making use of the NoSQL data-stores, which represent the modern data storage
technology. Apart from NoSQL technologies, multiple co-operative agents are used.
In terms of performance, the latency caused due to presence of the mediation layer
in PolyglotHIS is negligible and becomes totally insignificant as the dataset size
increases. Undoubtedly, the overall complexity of the system increases as there
is an impedance mismatch between various data-stores in terms of data modeling
and query languages; however, proposed solution is still advantageous because of
xiii
its ability to store the data according to its usage, thereby simplifying the overall
programming model. Decentralized data processing is also made possible due to the
use of multiple data-stores.
Description
Doctor of Philosophy-Computer Science-Thesis
