Tutorials

Data Analytics in Multi-Engine Environments

Dr. Verena Kantere, University of Geneve
PhD student Maxim Filatov, University of Geneve

Abstract

The performance of analytics on Big Data collections is the focus of a lot of research and is becoming a leading requirement in many business domains and scientific disciplines. Data analytics includes techniques, algorithms and tools for the inspection of data collections in order to extract patterns, generalizations and other useful information. The success and effectiveness of such analysis depend on numerous challenges related to the data itself, the nature of the analytics tasks, as well as the computing environment over which the analysis is performed.

Data analytics may vary significantly in terms of the properties of data, type of analysis and processing system. For example, data may be structured, semi-structured, or unstructured residing in files, and may have different inter-dependencies, such as relational constraints, tree-like or graph dependencies. The type of analysis may include stream processing, information retrieval, query processing, mining, clustering, integration, and other. The underlying processing systems may be traditional relational DBMSs, but also RDF stores, NoSQL databases, graph databases etc.

The above diversity across data, processing and systems has recently spawned an interest in the research community for the creation of transparent all-inclusive solutions for data analytics in multi-engine environments. The challenges for the creation of such solutions lie in the consolidation of different engine capabilities for data processing, such that the integrated system adapts its operation depending on the input workload and the data location and type, and the processing load of the engines. Research to tackle this challenge includes the definition of versatile programming models, engine performance modeling and monitoring, planning and optimization techniques, parallel deployment and execution on multiple engines, workflow management and visualization techniques.

The proposed tutorial will present the current picture in research for the realization of multi-engine data analytics. Towards this end, the tutorial will start with the discussion of modeling of analytics, including new models and languages to program, represent and execute complex tasks. The tutorial will focus on the execution of analytics: planning, optimizing and executing complex or multiple workflows especially on dynamic multi-engine and elastic environments. Finally, the tutorial will trigger the discussion for the development of tools for advanced analytics tasks, such as operators for regular and irregular computations.

Dr. Verena Kantere Bio

Verena Kantere is a Maître d’ Enseignement et de Recherche (equivalent to Associate Professor) at the Centre Universitaire d’ Informatique (CUI) of the University of Geneva (UniGe) working towards the provision and exchange of data services in cloud environments, focusing on the management of Big Data and performance of Big Data analytics, by developing methods, algorithms and fully fledged systems. Before coming to the UniGe she was a tenure-track junior assistant professor at the Department of Electrical Engineering and Information Technology at the Cyprus University of Technology (CUT). She has received a Diploma and a Ph.D. from the National Technical University of Athens, (NTUA) and a M.Sc. from the Department of Computer Science at the University of Toronto (UofT), where she also started her PhD studies. After the completion of her PhD studies she worked as a postdoctoral researcher at the Ecole Polytechnique Fédérale de Lausanne (EPFL). During her graduate studies she developed methods, algorithms and fully fledged systems for data exchange and coordination in Peer-to-Peer (P2P) overlays with structured and unstructured data, focusing on the solution of problems of data heterogeneity, query processing and rewriting, multi-dimensionality and management of continuous queries. Furthermore, she has shown interest and work in the field of the Semantic Web, concerning the problem of semantic similarity, annotation, clustering and integration. She is currently leading the research on Adaptive Analytics in the ASAP (http://www.asap-fp7.eu/) EU project. She has created and co-chaired several workshops, the most recent being the Workshop on Multi-Engine Data Analytics (collocated with EDBT 2016), she has given 30 invited talks in international conferences, universities and research institutes and has served in the PC of 40 international conferences and workshops.

Maxim Filatov Bio

Maxim Filatov is a PhD student working in the Centre Universitaire d'Informatique (CUI) of the University of Geneva (UniGe), under the supervision of Prof. Verena Kantere. Prior to Geneva, Maxim obtained his Masters degree in Mechanics at the Computational Mechanics department of Lomonosov Moscow State University, where he also started his PhD studies. During this period he also worked as a researcher in RD of Roxar and TimeZYX on the development of reservoir simulation software. He developed methods, algorithms and fully fledged systems for modeling of multiphase flow, focusing on the performance aspect. His PhD research is focused on the management of Big Data and performance of Big Data analytics. He is currently working on Adaptive Analytics in the ASAP (http://www.asap-fp7.eu/) FP7 EU project.



Important dates

Conference
Paper submission 05.06.2016
Tutorial application submission 31.05.2016
Notification of acceptance 11.07.2016
Camera-ready papers 05.08.2016
PhD Workshop
Paper submission 10.06.2016
Notification of acceptance 30.06.2016
Camera-ready papers 05.08.2016
Satellite Events
Satellite Event submission 01.04.2016
Notification of acceptance 15.04.2016