Please use this identifier to cite or link to this item:
|Title:||A Novel Framework for Querying Multiparadigm Databases|
|Keywords:||PostgreSQL;MongoDB;Datalog;NoSQL databases;Redis;Framework;computer science|
|Abstract:||Relational databases evolved in accordance to the prevailing technological requirements and constraints that had suitability, applicability and relevance at that time. However, times have changed and also the contemporary requirements. To alleviate the problems associated with relational databases in handling present big data, which is predominantly un-structured, a new class of databases has emerged, known as NoSQL databases. As and when a new class of data-storage emerges and becomes popular, researchers start working towards its integration with the existing databases. Likewise, with the widespread use of NoSQL data-stores, the problem of integrating them with existing database technology has became a challenge. The goal is to select the most appropriate data storage technology that meets the specific requirements of each module of the application. Amalgamation of different databases within an application is known as Multiparadigm approach or Polyglot-persistence. Persistence needs of applications are progressing from mostly relational to a mixture of data-stores. For example various modules of Health-care Information System (HIS) use different data-stores to model data closer to their semantic usage. The researcher has showcased the applicability of our multipradigm framework in HIS, considering the variety of data and diverse categories of NoSQL data-stores with which they may be managed. But the concept is equally well applicable to any other application area, where different parts of the application deals with distinct data formats. The researcher has implemented a healthcare information system – PolyglotHIS, which makes use of one relational and two NoSQL data-stores. This coalition of datastores is not arbitrary, instead it is prudently chosen on the basis of careful analysis of alternative data-stores. Each involved data-store has its own specific advantage. Relational data-stores are preferred for data pertaining to financial transactions, since they support transactional properties. Employees’ payroll data, patients’ billing xii information and financial component of pharmacy department are handled by the relational database – PostgreSQL. NoSQL data-stores supports BASE (Basically Available Soft-state, Eventually-consistent) properties, which is the opposite of ACID and therefore not suitable for transactions. Other two NoSQL data-stores used in PolyglotHIS are: MongoDB and Neo4j. MongoDB, the most widely-used document-database today, is schema-less and best suited for storing unstructured or semi-structured data, such as laboratory reports, laboratory images, instrument manuals, photos of the patients and doctors, etc. The data containing in-built relationships, such as blood relation between various patients thereby helping doctors to trace presence of any hereditary disease(s) by examining these relationships, are stored in graph database (i.e. Neo4j). Interlinking between symptoms has also been stored in Neo4j graph database to assist the doctor in visualizing the links between symptoms and the disease, leading to quicker diagnosis. Integration of multiple data-stores is facilitated through usage of multiple cooperative agents, which make up mediation layer of the system. Provision of a framework to represent knowledge about schemas of constituent data-stores in a unified representation scheme is achieved with the help of Datalog facts and rules. Datalog, which is a declarative logic programming language, is used to store sets of facts and rules, helps in the storing and inferring about the capabilities of data-stores used in the PolyglotHIS. Homogenization of results obtained from heterogeneous data-stores has been made possible due to the support for JSON format within all the involved data-stores. This proposed approach is novel in the way various data-stores are integrated, making use of the NoSQL data-stores, which represent the modern data storage technology. Apart from NoSQL technologies, multiple co-operative agents are used. In terms of performance, the latency caused due to presence of the mediation layer in PolyglotHIS is negligible and becomes totally insignificant as the dataset size increases. Undoubtedly, the overall complexity of the system increases as there is an impedance mismatch between various data-stores in terms of data modeling and query languages; however, proposed solution is still advantageous because of xiii its ability to store the data according to its usage, thereby simplifying the overall programming model. Decentralized data processing is also made possible due to the use of multiple data-stores.|
|Description:||Doctor of Philosophy-Computer Science-Thesis|
|Appears in Collections:||Doctoral Theses@CSED|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.