Distributed Optimization Over Large-Scale Systems for Big Data Analytics
Mostra/ Apri
Creato da
Shahbazian, Reza
Leone, Nicola
Grandinetti, Lucio
Guerriero, Francesca
Metadata
Mostra tutti i dati dell'itemDescrizione
Formato
/
UNIVERSITA’ DELLA CALABRIA
Dipartimento di Mathematica e Informatica
Dottorato di Ricerca in
Mathematica e Informatica
CICLO
XXXII; A large-scale system is defined as one that supports multiple, simultaneous users who access the
core functionality through some network. Nowadays, enormous amount of data is continually
generated at unprecedented and ever-increasing scales. Large-scale data sets are collected and
studied in numerous domains, from engineering sciences to social networks, commerce,
bimolecular research, and security. Big Data is a term applied to data sets whose size or type is
beyond the ability of traditional relational databases to capture, manage, and process with
acceptable latency. Usually, Big Data has one or more of the characteristics including high volume,
high velocity, or high variety. Big Data challenges include capturing data, data storage, data
analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data
source. Generally, Big Data comes from sensors, devices, video or audio, networks, log files,
transactional applications, web, and social media, in a very large-scale. Big Data is impossible to
analyze by using traditional central methods and therefore, new distributed models and algorithms
are needed to process the data.
In this thesis, we focus on optimization algorithms for Big Data application. We review some of
the recent machine learning, convex and non-convex, heuristic and stochastic optimization
techniques and available tools applied to Big Data. We also propose a new distributed and
decentralized stochastic algorithm for Big Data analytics. Our proposed algorithm is fully
distributed to decide large-scale networks and data sets. The proposed method is scalable to any
network configuration, is near real-time (in each iteration, a solution is provided although it might
not be the optimum one) and more critical, robust to any missing data or communication failures.
We evaluate the proposed method by a practical example and simulations on cognitive radio
networks. Simulation results confirmed that the proposed method is efficient in terms of accuracy
and robustness.
We assume that the distributed data-sources should be capable of processing their data and
communicate with neighbor sources to find the network objective as an optimal decision. Some
challenges are introduced by new technologies such as 5G or high-speed wireless data transfer,
including imperfect communications that damage the data. We propose an optimal algorithm that
uses optimal weighting to combine the shared data coming from neighbors. This optimal weight
improves the performance of the decision-making algorithm in terms of error and convergence
rate. We evaluate the performance of the proposed algorithm mathematically and introduce the
step-sized conditions that guaranteed the convergence of the proposed algorithm. We use computer
simulations to evaluate the network error. We prove that in a network diagram with ten datasources,
the network performance of the proposed algorithm outperforms some of the known
optimal solutions such as Metropolis and adaptive combination.
Keywords: Optimization, Big Data, Large-Scale, Distributed, Optimal Weight.Soggetto
Big Data; Optimization; Distributed; Large scale-optimal weight
Relazione
INF/01;