Συντάχθηκε 17-10-2013 14:17
από Balasi Panagiota
Email συντάκτη:
Ενημερώθηκε:
-
Ιδιότητα: -.
ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ
Σχολή Ηλεκτρονικών Μηχανικών και Μηχανικών Υπολογιστών
Πρόγραμμα Προπτυχιακών Σπουδών
ΠΑΡΟΥΣΙΑΣΗ ΜΕΤΑΠΤΥΧΙΑΚΗΣ ΕΡΓΑΣΙΑΣ
Ευάγγελου Βαζαίου
με θέμα
BePadoop: Ακριβής συμπερασμός σε μεγάλης κλίμακας δεδομένα στο Hadoop
BePadoop: Large Scale Exact Inference on Hadoop
Παρασκευή 18 Οκτωβρίου 2013, 13:30
Αίθουσα συνεδριάσεων ΗΜΜΥ, Κτίριο Επιστημών, Πολυτεχνειούπολη
Εξεταστική Επιτροπή
Καθηγητής Μίνως Γαροφαλάκης (επιβλέπων)
Αναπληρωτής Καθηγητής Μιχαήλ Λαγουδάκης
Επίκουρος Καθηγητής Βασίλης Σαμολαδας
Περίληψη
Abstract
The critical need for effective processing of inference queries on massive amounts
of uncertain/probabilistic data arises naturally in numerous modern application do-
mains. At the same time, the widespread use of large-scale parallel infrastructures
(e.g., Hadoop-based clusters) has placed massive processing power at the fingertips of
users and applications around the globe, thus enabling fast data analytics over previ-
ously unimaginable volumes of real-life data. Still, due to the inherent difficulty and
complexity of probabilistic inference, the effective parallelization of such large-scale
inference queries continues to pose several difficult research challenges.
In this paper, we present BePadoop the first efficient, Hadoop-based exact infer-
ence algorithm (based on Belief Propagation (BP)) for large-scale probabilistic data
analysis. BePadoop relies on smart pre-processing of the graphical model and takes
advantage of the crucial observation that, during BP over the model’s junction tree
only a small slice of vertices are ready to send informative messages to their neigh-
bors; furthermore, these computations are independent of each other and can be effec-
tively parallelized. This also allows us to reduce the communication cost between the
Hadoop map and reduce phases. To further improve efficiency, we provide an alternate
representation for model cliques which has linear space requirements, thus drastically
reducing the size of each junction-tree vertex. Extensive experiments with BePadoop
over large probabilistic datasets have verified the effectiveness of our approach.