Investigation and Analysis of Data Duplication in Distributed Database

Document Type : Original Article

Abstract
The growing popularity of big data analysis and cloud computing has
created new big data management standards. Sometimes, programmers
may interact with a number of heterogeneous data repositories, depending
on the information they are responsible for. Until now, relational
databases were an optimal choice for the organization. However, with the
continuous growth of stored and analyzed data, relational databases show
various limitations. For example: scalability and storage limitations, and
loss of query performance due to large volumes of data, and storage and
management of larger databases are challenging. In order to overcome
these limitations, a new database model with a set of new features, called
NoSQL databases, was developed. Non-relational databases emerged as
an advanced technology and can be used alone or as a complement to
relational databases. One of the important challenges in this field, which
has a significant impact on the efficiency of such systems, is the
reproduction of data. The replication factor affects data availability, which
can be related to read time and ultimately execution time. There are many
types of NoSQL with different functionality, so it is important to compare
them in terms of performance and to examine how performance relates to
the type of database. In this article, we evaluate three popular NoSQL
databases: Cassandra, HBase, MongoDB, and a review of replication
algorithms for various distributed storage and content management
systems, including distributed database management systems, and
analysis and review and replication of data in Mongodb, Hbase, and
Cassandra will be. It is an attempt to analyze the performance of several
distributed systems using repetition factors and evaluate them for different
data and influence its improvement by comparison. The replication factor
is used in distributed database systems in order to increase performance
and availability. Also, replication increases scalability in database
systems. In this regard, each distributed NoSQL database system carries
out its own policies in order to select the type and number of replications.