A novel algorithm for transforming row-oriented databases into column-oriented databases
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With the development of cloud computing and distributed system, more and more
applications are started to migrate to the cloud to use its computing power and
scalability. These systems require interaction of heterogeneous data and applications
that involves databases that store the data of the common domain under different
representations. The interactions are result of schema mappings and transformation
that is also called data migration. The above interaction of different systems and
integration is known as information integration.
Two well-known approaches to information integration are data exchange and
data integration. In the first approach, data is extracted from multiple heterogeneous
sources, restructured into a common format, and finally transformed into a target
schema. In second approach, several local databases are queried by users through a
single, integrated global schema. Mappings are determined to know, how the queries
that users pose on the global schema are to be reconstructed in terms of the local
schemas.
In this thesis, a novel approach is proposed that transforms a relational
database into a column-oriented database. Out of various available column-oriented
databases like HBase, Cassandra etc., we have considered HBase for the study, which
is an open-source distributed database and similar to BigTable which is the database
proposed and used by Google. For relational database, we have considered
PostgreSQL. The transformation procedure is divided into two phases. First relational
schema is transformed into HBase schema based on the data model of HBase. Using
four rules, HBase schema is extracted from PostgreSQL schema which could be
further utilized to develop an HBase application. In the second phase, relationships
between elements of two schemas are expressed as a set of nested schema mappings,
which could be further used to create a set of nested queries that transform the
PostgreSQL database into the HBase database representation automatically.
Description
ME, CSED
