Please use this identifier to cite or link to this item:
http://hdl.handle.net/10266/6797
Title: | Scalable Metadata Storage and Retrieval Techniques for Very Large Distributed Storage Systems |
Authors: | Singh, Harcharan Jit |
Supervisor: | Bawa, Seema |
Keywords: | Distributed Computing;Big Data Storage Systems;Distributed file system;Information Storage;Information Retrieval;Data-intensive Computing;Metadata Management;Locality;Namespace;Stacked AutoEncoder;LaMeta;Machine Learning;High-performance computing;Storage and retrieval systems;Storage architectures;Cloud Storage;Recommendation System;Matrix factorization techniques |
Issue Date: | 9-Aug-2024 |
Abstract: | The rapid proliferation of global digital data has reached unprecedented levels, with a projected increase to a staggering 175 ZB by 2025, according to forecasts by the International Data Corporation (IDC). This exponential data growth presents immense challenges for storage and processing capabilities. Over the past three decades, parallel and distributed techniques have emerged as viable solutions to address the storage and retrieval demands of this ever-expanding data landscape. As digital data surges, large-scale storage systems generate massive volumes of metadata. Metadata is a crucial component of storage and file systems that contains vital information about organizational structure, block identification numbers, physical block locations, and access permissions. Effectively managing metadata of exabyte or zettabyte storage capacities has become critical for parallel and distributed storage techniques. This research delves into the challenges posed by the continuous growth of global digital data. It explores the development of parallel and distributed metadata management techniques for efficient storage and retrieval. It investigates the significance of metadata in storage systems and underscores the importance of scalable distributed metadata management to handle these colossal data volumes. The provisioning of an efficient ultra-large scalable distributed storage system for expanding cloud applications has been a challenging endeavour for both academia and industry. In such an ultra-large-scale storage system, data are distributed across multiple storage nodes to enhance performance, scalability, and availability. Access to this distributed data is facilitated through metadata, maintained by multiple metadata servers. The metadata carries crucial information about the physical address of data and access privileges. The efficiency of a storage system heavily relies on effective metadata management. This research presents a comprehensive systematic literature analysis of metadata management techniques in storage systems. It aims to help researchers understand the significance of metadata management and the important parameters of metadata management techniques for storage systems. It systematically examines metadata management techniques developed by various industry and research groups, identifying various taxonomies based on different metadata distribution techniques. Furthermore, this research investigates techniques based on distribution structures and critical parameters of metadata management. It presents a balanced view of the strengths and weaknesses of individual existing techniques, assisting researchers in selecting the most appropriate approach for specific applications. Additionally, it addresses existing challenges and significant research directions in metadata management for the benefit of researchers. This research proposes novel locality-aware metadata management techniques (LaMeta). The proposed work aims to enhance the performance of ultra-large distributed storage systems. The LaMeta leverages the locality of Metadata Servers (MDS) in a globally distributed ultra-large storage system to enhance metadata management performance. The novel LaMeta tailored for ultra-large distributed storage systems based static subtree, dynamic subtree, hash-based, and DROP-based metadata distribution techniques outperform than their baseline techniques. LaMeta-based techniques are designed to optimize metadata operations by efficiently utilizing the proximity of MDSs within the distributed storage system. It thoroughly evaluates the proposed techniques through extensive experiments and performance comparisons using real metadata datasets. In summary, LaMeta ensures efficient metadata operation routing in distributed storage systems. The research presented in this thesis significantly contributes to the field of metadata management for ultra-large distributed storage systems. The novel LaMeta technique opens new avenues for optimizing metadata operations, improving storage performance, and meeting the demands of next-generation distributed storage technologies. We evaluated the performance of LaMeta-based metadata distribution techniques against contemporary methods, measuring operations per second across varying scales of Metadata Servers (MDSs) from 5 to 50. Performance gains were assessed based on the aggregate throughput of metadata operations. The LaMeta-based static subtree metadata distribution techniques show gains from 2% to 5% at the 5 MDs scale, 9% to 12% at the 10-20 MDS scale and 12% to 14% at 25 to 50 than baseline static subtree partitioning. In LaMeta-based dynamic subtree partitioning, the average throughput gains range from 4% to 9% at 5 MDS, 10% to 20% at 10-30 MDSs scale and more than 20% at 40-50 MDS scale. While in LaMeta-based hash partitioning, the gains are 1% to 9% at 5-20 MDS scale, 9% to 19% at 25-40 MDS scale and more than 24% at 50 MDS scale. The proposed locality-aware metadata management (LaMeta) for large distributed storage systems is effective for all such latency-sensitive data storage and retrieval applications such asHigh-Frequency Trading (HFT), Online Gaming, Content Delivery Networks (CDNs), Cloud Computing, IoT (Internet of Things), Autonomous Vehicles, etc. |
URI: | http://hdl.handle.net/10266/6797 |
Appears in Collections: | Doctoral Theses@CSED |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
HJS_Thesis_08-Aug-2024-Final.pdf | 5.7 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.