Distributed Optimization Over Large-Scale Systems for Big Data Analytics

Shahbazian, Reza; Leone, Nicola; Grandinetti, Lucio; Guerriero, Francesca

Mostra/Apri

(1.733Mb)

Creato da

Shahbazian, Reza

Leone, Nicola

Grandinetti, Lucio

Guerriero, Francesca

Metadata

Mostra tutti i dati dell'item

URI

https://hdl.handle.net/10955/5566

Descrizione

Formato

UNIVERSITA’ DELLA CALABRIA Dipartimento di Mathematica e Informatica Dottorato di Ricerca in Mathematica e Informatica CICLO XXXII; A large-scale system is defined as one that supports multiple, simultaneous users who access the core functionality through some network. Nowadays, enormous amount of data is continually generated at unprecedented and ever-increasing scales. Large-scale data sets are collected and studied in numerous domains, from engineering sciences to social networks, commerce, bimolecular research, and security. Big Data is a term applied to data sets whose size or type is beyond the ability of traditional relational databases to capture, manage, and process with acceptable latency. Usually, Big Data has one or more of the characteristics including high volume, high velocity, or high variety. Big Data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. Generally, Big Data comes from sensors, devices, video or audio, networks, log files, transactional applications, web, and social media, in a very large-scale. Big Data is impossible to analyze by using traditional central methods and therefore, new distributed models and algorithms are needed to process the data. In this thesis, we focus on optimization algorithms for Big Data application. We review some of the recent machine learning, convex and non-convex, heuristic and stochastic optimization techniques and available tools applied to Big Data. We also propose a new distributed and decentralized stochastic algorithm for Big Data analytics. Our proposed algorithm is fully distributed to decide large-scale networks and data sets. The proposed method is scalable to any network configuration, is near real-time (in each iteration, a solution is provided although it might not be the optimum one) and more critical, robust to any missing data or communication failures. We evaluate the proposed method by a practical example and simulations on cognitive radio networks. Simulation results confirmed that the proposed method is efficient in terms of accuracy and robustness. We assume that the distributed data-sources should be capable of processing their data and communicate with neighbor sources to find the network objective as an optimal decision. Some challenges are introduced by new technologies such as 5G or high-speed wireless data transfer, including imperfect communications that damage the data. We propose an optimal algorithm that uses optimal weighting to combine the shared data coming from neighbors. This optimal weight improves the performance of the decision-making algorithm in terms of error and convergence rate. We evaluate the performance of the proposed algorithm mathematically and introduce the step-sized conditions that guaranteed the convergence of the proposed algorithm. We use computer simulations to evaluate the network error. We prove that in a network diagram with ten datasources, the network performance of the proposed algorithm outperforms some of the known optimal solutions such as Metropolis and adaptive combination. Keywords: Optimization, Big Data, Large-Scale, Distributed, Optimal Weight.

Soggetto

Big Data; Optimization; Distributed; Large scale-optimal weight

Relazione

INF/01;