Performance Analysis of NoSQL Databases with Hadoop Integration
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
According to the recent study by IBM in 2012 “the 90% of the internet data is produced in last 2 years and every day 2.5 exabytes of data is being created”. All of these was reason for the development of new databases called NoSQL. NoSQL databases have
been designed to cater the requirements of handling the very large data also called “Big
Data”. Unlike relational databases which are inefficient to process the big data the
NoSQL databases have been designed to address the specific needs of current big data era like storing unstructured data, scalability and read/write efficiency. Relational databases have fixed schema due to which unstructured information cannot be stored in it and internet is exploding with such kind of information which comes from
numerous sources. Since the data comes from variety of sources so there storage format
cannot be controlled which has given rise to unstructured storage and to handle all these
task through relational database has become inefficient. Along with the fulfilling
storage requirements NoSQL databases have made their entry in real time analytics.
Real time analytics is now being used in e-commerce, social networking websites to
predict the customer needs and make business decisions accordingly.
In this thesis different types of NoSQL databases are explained and integration of
NoSQL databases “MongoDB” and “Cassandra” with analytics tool “Hadoop” is done.
Hadoop is not a database but it is a framework to handle large amount of data by
distributing the data among different cluster nodes and applies parallel processing on them. We have evaluated both integrated technologies on different parameters like read and write, scalability and fault tolerance.
Description
ME, CSED
