• Login
    Mostra Item 
    •   Unical - archivio istituzionale delle tesi di dottorato
    • Tesi di Dottorato
    • Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica - Tesi di Dottorato
    • Mostra Item
    •   Unical - archivio istituzionale delle tesi di dottorato
    • Tesi di Dottorato
    • Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica - Tesi di Dottorato
    • Mostra Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Data mining techniques for large and complex data

    Mostra/Apri
    (1.562Mb)
    Creato da
    Narvaez Vilema, Miryan Estela
    Crupi, Felice
    Angiulli, Fabrizio
    Metadata
    Mostra tutti i dati dell'item
    URI
    http://hdl.handle.net/10955/1875
    https://doi.org/10.13126/unical.it/dottorati/1875
    Descrizione

    Formato

    /
    Dottorato di Ricerca in Information and Communication Engineering For Pervasive Intelligent Environments, Ciclo XXIX; During these three years of research I dedicated myself to the study and design of data mining techniques for large quantities of data. Particular attention was devoted to training set condensing techniques for the nearest-neighbor classification rule and to techniques for node anomaly detection in networks. The first part of this thesis was focused on the design of strategies to reduce the size of the subset extracted from condensing techniques and to their experimentation. The training set condensing techniques aim to determine a subset of the original training set having the property of allowing to correctly classify all the training set examples. The subset extracted from these techniques also known as consistent subset. The result of the research was the development of various strategies of subset selection, designed to determine during the training phase the most promising subset based on different methods of estimating test accuracy. Among them, the PACOPT strategy is based on Pessimistic Error Estimate (PEE) to estimate generalization as a trade-off between training set accuracy and model complexity. The experimental phase has had for reference the FCNN technique of condensation. Among the methods of condensation based on the nearest neighbor decision rule (NN rule), FCNN (for Fast Condensed NN) it is one of the most advantageous technique, particularly in terms of time performance. We showed that the designed selection strategies guarantee to preserve the accuracy of a consistent subset. We also demonstrated that the proposed selection strategies guarantee to significantly reduce the size of the model. Comparison with notable training-set reduction techniques for the NN rule witness for state-of-the-art performances of the here introduced strategies. The second part of the thesis is directed towards the design of analysis tools for network structured data. Anomaly detection is an area that has received much attention in recent years. It has a wide variety of applications, including fraud detection and network intrusion detection. The techniques focused on anomaly detection in static graphs assume that the networks do not change and are capable of representing only a single snapshot of data. As real-world networks are constantly changing, there has been a shift in focus to dynamic graphs, which evolve over time. We present a technique for node anomaly detection in networks where arcs are annotated with time of creation. The technique aims at singling out anomalies by taking simultaneously into account information concerning both the structure of the network and the order in which connections have been established. The latter information is obtained by timestamps associated with arcs. A set of temporal structures is induced by checking certain conditions on the order of arc appearance denoting different kinds of user behaviors. The distribution of these structures is computed for each node and used to detect anomalies. We point out that the approach here investigated is substantially different from techniques dealing with dynamic networks. Indeed, our aim is not to determine the points in time in which a certain portion of the networks (typically a community or a subgraph) exhibited a significant change, as usually done by dynamic-graph anomaly detection techniques. Rather, our primary aim is to analyze each single node by taking simultaneously into account its temporal footprint.; Università della Calabria
    Soggetto
    Data mining
    Relazione
    ING-INF/06;

    Policy e regolamenti
    Copyright © Università della Calabria - Sistema Bibliotecario di Ateneo - Servizio Automazione Biblioteche | DSpace 6.3
    Contattaci
    Theme by 
    @mire NV
     

     

    Ricerca

    Esplora perArchivi & CollezioniData di pubblicazioneAutoriTitoliSoggettiQuesta CollezioneData di pubblicazioneAutoriTitoliSoggetti

    My Account

    LoginRegistrazione

    Policy e regolamenti
    Copyright © Università della Calabria - Sistema Bibliotecario di Ateneo - Servizio Automazione Biblioteche | DSpace 6.3
    Contattaci
    Theme by 
    @mire NV