EBMIP 2013
Workshop on Event-based Media Integration and Processing
co-located with ACM Multimedia 2013 – October 21-22, Barcelona, Spain

Keynote speakers:

Ansgar Scherp, University of Mannheim, Germany

Alejandro Jaimes, Yahoo! Research-Barcelona, Spain

Benoit Huet, EURECOM, France

Cees G.M. Snoek, University of Amsterdam, Nederlands

Francesco De Natale, University of Trento, Italy

Ivan Tankoyeu, University of Trento, Italy

Jiebo Luo, University of Rochester, US

Lexing Xie, Australian National University, Australia

Opher Etzion, IBM Research Lab Haifa, Israel

Ramesh Jain, University of California, Irvine, US

Symeon Papadopoulos, Information Technologies Institute, Centre for Research and Technology Hellas, Greece

Events in Multimedia: Theory, Model, and Application

Ansgar Scherp (University of Mannheim, Germany).

Abstract. We introduce the notion of events and objects. While events are said to occur or happen (i.e., they extend over time), objects are said to exist (and unfold in space). Events and their objects allow for representing human experiences and can be related in manifold ways. Implementing this theory of events in a formal model for representing events and event relations enables for a better interoperability of distributed multimedia event-based systems. Our formal model provides comprehensive support to represent time and space, objects and persons, as well as mereological, causal, and correlative relationships between events. In addition, the model provides extensible means for event composition, modeling event causality and event correlation, and representing different interpretations of the same event. Selected features of the model will be presented and discussed. Finally, a mobile event-based application will be presented that implements an instance of an event model for social media data. The application allows for exploring events such as concerts, weekly markets, opening hours, etc. and at the same time explore places such as sights, restaurants, organizations, and persons extracted from different sources. The mobile client retrieves the data through a proxy server, which applies an incremental matching algorithm that integrates complementing information as well as eliminates duplicates from the social media sources.
Click here to download presentation handouts.

Ansgar Scherp is Junior professor for Media Informatics and New Media in Business Informatics at the Research Group on Data and Web Science of the University of Mannheim since August 2012. Since April 2013, I am also associated professor with the Institute for Enterprise Systems (InES) in Mannheim. Prior to that he was working as Juniorprofessor for Semantic Web at the University of Koblenz-Landau in the Institute for Information Systems Research since April 2011 and lead the focus group on Interactive and Multimedia Web at the Institute for Web Science and Technologies (WeST) at the same university since May 2008. He has studied computer science at the University of Oldenburg, Germany and has received the Advancement Award for Outstanding Results in Studies from the Association for Electrical, Electronic & Information Technologies (VDE), Germany in 1998. He finished his PhD with the thesis title "A Component Framework for Personalized Multimedia Applications" at the University of Oldenburg, Germany with distinction in 2006. Afterwards, Mr. Scherp has been EU Marie Curie Fellow with Prof. Ramesh Jain at the Donald Bren School of Information and Computer Sciences, University of California, Irvine, USA in Los Angeles between November 2006 to October 2007. He has lead the University of Koblenz-Landau's activities in the EU Integrated Project WeKnowIt from 2008 to 2011. Here, he has been leading the work packages on knowledge management and mass intelligence and has been member of the project management board and steering board committee. Mr. Scherp is scientific leader of the EU project SocialSensor, where the University of Koblenz-Landau is leading the work package on user modeling and presentation. In December 2011, he has received his Venia Legendi (Habilitation) with the thesis title "Semantic Media Management: Process Innovation along the Value Chain of Media Companies" (in German) from the University of Koblenz-Landau, Germany. He has published over 60 peer-reviewed scientific publications including 12 journal articles, 21 conference papers, and 10 book chapters.

Insights from Big Data: Interaction, Design, and Innovation

Alejandro Jaimes (Yahoo! Research-Barcelona, Spain).

Abstract. In recent years, our ability to process large amounts of data has increased significantly, creating many opportunities for innovation. Having large quantities of data, however, does not necessarily turn into actionable insights that make a difference for users in consumer applications. In this talk I will give a quick overview of some ways in which “big data” can be used in industry, with a particular focus on Human-Centered approaches to innovation. In particular, I will discuss how the combination of qualitative and quantitative methods can be of benefit, giving examples around social media and giving an overview of some of the areas of research I am currently focusing on at Yahoo!. Within this context, I will outline a blueprint for a research framework as it applies to innovation, and discuss specific technical approaches within that framework. I will argue on the importance of taking a human-centered view and highlight what I consider the most fundamental problems in computer science today from that perspective.

Dr. Alejandro (Alex) Jaimes is Director of Research at Yahoo! where he is in charge of the Social Media Engagement (SOMER) and Learning for Multimedia and Vision (LMV) groups in Barcelona and Bangalore. In the Spring of 2013 he was a visiting Professor at KAIST’s (South Korea) Web Science Department under the WCU program. His research focuses on Human-Centered Computing, particularly in the areas of social media and Multimedia. The output of his teams’ research has been included in several products at Yahoo! and he led the launch of Yahoo! Clues, a product created in 2010. Dr. Jaimes is general chair of ACM Multimedia 2013, Developers Track Chair for WWW 2014, Practice and Experience track chair for WWW 2013, the founder of the ACM Multimedia Interactive Art program, and Industry Track chair for ACM RecSys 2010 and UMAP 2013, among others. His work has led to over 80 technical publications in international conferences and journals. He has been an invited speaker at the Big Data & Analytics Innovation Summit (2013), Practitioner Web Analytics (2010), CIVR 2010, ECML-PKDD 2010 and KDD 2009 and (Industry tracks), ACM Recommender Systems 2008 (panel), DAGM 2008 (keynote), and several others. Before joining Yahoo! Dr. Jaimes was a visiting professor at U. Carlos III in Madrid and founded and managed the User Modeling and Data Mining group at Telefónica Research. Prior to that Dr. Jaimes was Scientific Manager at IDIAP-EPFL (Switzerland), and was previously at Fuji Xerox (Japan), IBM TJ Watson (USA), IBM Tokyo Research Laboratory (Japan), Siemens Corporate Research (USA), and AT&T Bell Laboratories (USA). Dr. Jaimes received a Ph.D. in Electrical Engineering (2003) and a M.S. in Computer Science from Columbia U. (1997) in NYC.

Event-based Summarization for Media Hyperlinking

Benoit Huet (EURECOM, France).

Abstract. The exponential growth of media sharing and demand on the Web comes with a need for effective methods to explore them. Hence, media hyperlinking, which consists in linking together videos based on their content, uncovering the relation between them, is becoming an important functionality for providing users with a way to navigate between video entities and satisfy their information needs. Thanks to such technology, multimedia search can often be replaced by recommendation. A particular usage of hyperlinking is to provide, through a second screen application, extra information or content about the video watched on a main screen (TV). In this talk, we will focus on media hyperlinking from the news: the task at hand consists in locating and identifying relevant media items, and display them on the second screen. The related material is selected based on underlying events that will be detected in the news: events are seen as structuring elements, defined in terms of date, location, intent and attendance. Two approaches for event-based mining of such additional and related information will be presented. Each of them satisfying a different user information need.
Click here to download presentation handouts.

Dr. Benoit Huet is Assistant Professor in the multimedia information processing group of Eurecom (France). In 1993, he was awarded the MSc degree in Artificial Intelligence from the University of Westminster (UK) with distinction, where he then spent two years working as a research and teaching assistant. He received his DPhil degree in Computer Science from the University of York (UK) for his research on the topic of object recognition from large databases. He was awarded the HDR (Habilitation to Direct Research) from the University of Nice Sophia Antipolis, France in October 2012 on the topic of Multimedia Content Understanding: Bringing Context to Content. He is associate editor for Multimedia Tools and Application (Springer), Multimedia Systems (Springer) and has been guest editor for a number of special issues (EURASIP Journal on Image and Video Processing, IEEE Multimedia). He regularly serves on the technical program committee of the top conference of the field (ACM MM/ICMR, IEEE ICME). He is chairing the IEEE MMTC Interest Group on Visual Analysis, Interaction and Content Management (VAIG). He is vice-chair of the IAPR Technical Committee 14 Signal Analysis for Machine Intelligence.

Five Recommendations for Recognizing Video Events by Concept Vocabularies

Cees G.M. Snoek (University of Amsterdam, Nederlands).

Abstract. Representing videos using vocabularies composed of concept detectors appears promising for generic event recognition. While many have recently shown the benefits of concept vocabularies for recognition, studying the characteristics of a universal concept vocabulary suited for representing events is ignored. In this talk, we present the findings of a study on how to create an effective vocabulary for arbitrary-event recognition in web video. We consider five research questions related to the number, the type, the specificity, the quality and the normalization of the detectors in concept vocabularies. From the analysis we derive a set of five recommendations for recognizing video events by concept vocabularies, which provide guidelines for future work.

Cees G. M. Snoek is currently an associate professor at the University of Amsterdam. He was a visiting scientist at Carnegie Mellon University, Pittsburgh, PA (2003) and at the University of California, Berkeley, CA (2010–2011). His research interest is video and image search. Dr. Snoek is the principal investigator of the MediaMill Semantic Video Search Engine, which is a consistent top performer in the yearly NIST TRECVID evaluations. He is member of the editorial boards for IEEE Multimedia and IEEE Transactions on Multimedia. Cees is recipient of an NWO Veni award (2008), an NWO Vidi award (2012) and the Netherlands Prize for ICT Research (2012). Several of his Ph.D. students have won best paper awards, including the IEEE Transactions on Multimedia Prize Paper Award.

Discovering Event Media Semantics using Games with a Hidden Purpose

Francesco De Natale (University of Trento, Italy).

Abstract. Automatic tools that allow discovering the semantics of a media object from its content show intrinsic limitations, due to the fact that current image content description and recognition approaches still suffer of a rather limited accuracy. The possibility of outsourcing part of these tasks to user crowds, exploiting the power of human computation, has been explored by various researchers, either for directly handling the problem or to produce a ground-truth for further elaboration by means of machine learning approaches. Although a well-designed crowdsourcing mechanism can provide good results, it is not time effective and requires investments for every job launched. In this talk we will introduce a different approach to achieve human cooperation in complex media analysis tasks, with the introduction of specifically designed games with a hidden purpose. In detail, we will show how one can produce a competitive game in which the evident goal and reward is simply entertain, playing and possibly winning matches against other players and gaining reputation, while the hidden purpose is to produce new knowledge on the media objects handled within this contests. A couple of examples will be presented, one conceived to detect event-related salient areas in event media, and the other designed to propagate the annotation across images with related contents. Tests will be presented for both games to demonstrate the viability of these approaches in solving complex tasks.
Click here to download presentation handouts.

Prof. Francesco De Natale graduated in Electronic Engineering (M.Sc. level) in 1990 at the University of Genova (Italy) and got a Ph.D. in Telecommunications in 1994 at the same University. In 1996 he got a position of Assistant Professor at the University of Cagliari and successively moved to the University of Trento, Italy, where he is Full Professor of Telecommunications Engineering (from 2003). He has been the Head of the Department of Information Engineering and Computer Science (DISI) from 2006 to 2009, and is currently leading the Research Lab on Multimedia Communications at the same Department (mmlab.disi.unitn.it) as well as the MMSPI (Multidimensional Multimodal Signal Processing and Interpretation Lab) of the Italian branch of the European Institute of Technology (EIT-ICTLabs@Italy). His research interests are focused on multimedia communications, with particular attention to image and video processing, analysis, and retrieval. He has a publication record of more than 200 works published on major international peer reviewed scientific journals and conferences. He was General Co-Chair of the Packet Video Workshop (PV-2000), Program Co-Chair of the IEEE Intl. Conf. on Image Processing (ICIP-2005), and General Chair of the ACM Intl. Conf. on Multimedia Retrieval (ICMR-2011). He has been Associate Editor of the IEEE Trans on Multimedia (2010-2013) and of the IEEE Trans. on Circuits and Systems for Video Technologies (2011-2013), and a member of the IEEE Signal Proc. Society Technical Committee on Multimedia Signal Processing (MMSP), chairing the Technical Directions Subcommittee. Prof. De Natale was appointed evaluator for several international bodies, including the European Commission, and the NSFs of US, Ireland and Qatar. Prof. De Natale is a Senior Member of IEEE and a member of ACM.

Event Duality: Exploitation of Personal and Social Dimensions for Photo Indexing

Ivan Tankoyeu (University of Trento, Italy).

Abstract: Recent approaches of media indexing use events as media aggregators, but do not fully consider the context in which the media asset has been produced and do not take the personal perspective of the user into account. To this end, we propose a new paradigm for the automated indexing of social media based on the notion of personal and social events. Within the scope of the talk I will introduce the distinction between social and personal events. Following this strategy I will describe technique for mining personal events from photo collection. Further analysis of events allows us to compose social events out of personal events and then automatically reveal interpersonal ties. Trying to tame the stream of big data in social networks we solely rely on image meta-data of time and space. The talk will consist of the following three parts: (i) personal event detection using individual, unsorted photo collections, in which we will describe the use of the spatio-temporal context embedded in digital photos to detect event boundaries within the collection; (ii) social event detection where we will give insights on the use of a tailored similarity measurement between personal events of different users; and (iii) the description of an analysis of event co-participation to propagate social connections.
Click here to download presentation handouts.

Ivan Tankoyeu is a PostDoc at University of Trento. He received his Ph.D. from University of Trento, Italy in 2013, having awarded a M.Sc. in Computer Science from the Belorussian State University in 2007. His main research interests include data and knowledge management, event based media indexing, event mining and exploitation from spatio-temporal data.

Classifying Images and Videos by Learning from Web Data

Jiebo Luo (University of Rochester, US).

Abstract: Everyday, increasingly rich and massive social multimedia data are being posted to the web. Such image and video data are generally accompanied by rich and valuable contextual information (e.g., tags, categories, and captions). Given any textual query (e.g., picnic), keywords (also called tags) based search can be readily used to collect a large number of relevant Flickr images or YouTube videos for classifying new images and videos. In the first part of our talk, we will introduce a visual event recognition framework for consumer videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). At its core, we develop a new domain adaptation method, referred to as Adaptive Multiple Kernel Learning (A-MKL), in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time features and static SIFT features) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web video domain and consumer video domain). Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data. In the second part of our talk, we will describe a new approach to learn a robust classifier for text-based image retrieval (TBIR) using relevant training web images (e.g. from Flickr), in which we explicitly handle noise in the loose labels of training images. Specifically, we first partition the relevant training web images and the randomly selected irrelevant training web images into clusters. By treating each cluster as a “bag” and the images in each bag as “instances”, we formulate this task as a multi-instance learning problem with constrained positive bags, where each positive bag contains at least a portion of positive instances. We present a new algorithm called MIL-CPB to effectively exploit such constraints on positive bags and predict the labels of test instances (images). Comprehensive experiments on two challenging real-world web image data sets demonstrate the effectiveness of our approach. Finally, we will discuss several future directions on how to effectively and efficiently exploit the freely available web data for visual recognition with minimal human supervision.
Click here to download presentation handouts.

Jiebo Luo joined the University of Rochester in Fall 2011 after over fifteen years at Kodak Research Laboratories, where he was a Senior Principal Scientist leading research and advanced development. He has been involved in numerous technical conferences, including serving as the program co-chair of ACM Multimedia 2010 and IEEE CVPR 2012. He is the Editor-in-Chief of the Journal of Multimedia, and has served on the editorial boards of the IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, Pattern Recognition, Machine Vision and Applications, and Journal of Electronic Imaging. He has authored over 200 technical papers and 70 US patents. Dr. Luo is a Fellow of the SPIE, IEEE, and IAPR. His research spans image processing, computer vision, machine learning, data mining, medical imaging, and ubiquitous computing. He has been an advocate for contextual inference in semantic understanding of visual data, and continues to push the frontiers in this area by incorporating geo-location context and social context. A recent research thrust focuses on exploiting social media for machine learning, data mining, and human-computer interaction, for example, mining the wisdom of crowds for social, political, and economic prediction and forecasting. He has published extensively in these fields with over 200 papers and 70 US patents.

Understanding Events and Message Popularity in Media-rich Social Networks

Lexing Xie (Australian National University, Australia).

Abstract. Multimedia is growing to take up more than 50% of the internet traffic. Understanding these content and their social traces presents new research challenges and opportunities at the intersection of rich-media content understanding and mining the social web. Several recent work in my group focuses on analyzing real-world event traces in social media, including: use hyperlink patterns to diffusion flow about news events, track large-scale video remix on Youtube, analyzing rich-media microblogs with cross-media topic model, and predicting user preference with fine-grained social interactions. I will share current results in mapping the macro-structure of the event web, and predicting message popularity from social and content features.
Click here to download presentation handouts.

Lexing Xie is Senior Lecturer of Computer Science at the Australian National University. She was research staff member at IBM T.J. Watson Research Center in New York from 2005 to 2010. She received B.S. from Tsinghua University, Beijing, China, and M.S. and Ph.D. degrees from Columbia University. Her research interests are in multimedia, social media, and applied machine learning. Lexing's research has received five best student paper and best paper awards between 2002 and 2011, and a Grand Challenge Multimodal Prize at ACM Multimedia 2012. She is an associate editor of ACM Transactions on Multimedia Computing, Communications and Applications, she regularly serves on the program and organizing committees of major multimedia, machine learning, and web conferences.

Semantics and modeling of events and contexts

Opher Etzion (IBM Research Lab Haifa, Israel).

Abstract. People are event-driven creatures, a lot of our daily behavior is reaction to event we observe or infer; in contrast computerized applications are mainly follow the request-response paradigm (the computer responds to explicit request by human). The availability of real-time data based on the Internet of Things and mobile devices, and the pressure to increase business velocity and provide real-time analytics decisions and actions, are the roots of a paradigm shift towards event-driven computing. In this talk we will concentrate around two main concepts: situation and context. Situation is a (possibly derived) event that requires a reaction, while context is a (possibly multi-dimensional) condition that provides semantic partitions over the flowing events. The first part of the talk drills down to the modeling aspects of deriving situations from events using event patterns, and discuss the evolution of modeling schemes; the second part of the talk discusses the different dimensions of context: temporal, spatial, segmentation and states, and shows examples of hybrid event-state oriented contexts.
Click here to download presentation handouts.

Opher Etzion is the chief scientist of event processing in IBM Haifa Research Lab. Previously he has been lead architect of event processing technology in IBM Websphere, and a Senior Manager in IBM Research division, managed a department that has performed one of the pioneering projects that shaped the area of "event processing". He is also the chair of EPTS (Event Processing Technical Society). In parallel he serves as a professor and academic advisor to the MIS department in the Yezreel Valley College, and adjunct professor at the Technion - Israel Institute of Technology and academic advisor to; over the years he supervised 6 PhD and 22 MSc theses. He has authored or co-authored more than 90 papers in refereed journals and conferences, on topics related to: active databases, temporal databases, rule-base systems, event processing and autonomic computing, and gave several keynote addresses and tutorials. He is the co-author of Event Processing in Action (with Peter Niblett), a comprehensive technical book about event processing and co-edited the book "Temporal Database - Research and Practice" Springer-Verlag, 1998. Prior to joining IBM in 1997, he has been a faculty member and Founding Head of the Information Systems Engineering department at the Technion, and held professional and managerial positions in industry and in the Israel Air-Force. He is a senior member of ACM, and has been general chair and program chair of various conferences such as COOPIS 2000 and ACM DEBS 2011. He won several prestigious awards over the years, such as the Israel Air-Force highest award for introduction of new technologies towards widely usage, IBM Outstanding Innovation Award and IBM Corporate Award (the highest IBM award) for the pioneering work on event processing. He was recognized as Distinguished Speaker by ACM.

Towards Smart Social Systems

Ramesh Jain (University of California, Irvine, US).

Abstract: Availability of enormous volumes of heterogeneous Cyber-Physical-Social (CPS) data streams may allow design and implementation of networks to connect various data sources to detect situations with little latency. In fact, in many cases it may even be possible to predict situations well in advance. This opens up new opportunities in designing smart social systems for specific tasks. Such systems may be very useful for many important problems at local as well as regional and even global level. We believe that such systems offer many novel challenges to researchers in multimedia, particularly in social and cross-modal media systems. We will present our ideas and early approach towards building smart social systems.
Click here to download presentation handouts.

Ramesh Jain is an entrepreneur, researcher, and educator. Ramesh co-founded several companies, managed them in initial stages, and then turned them over to professional management. These companies include PRAJA, Virage, and ImageWare. Currently he is involved in Stikco Studio. He has also been advisor to several other companies including some of the largest companies in media and search space. He is a Donald Bren Professor in Information & Computer Sciences at University of California, Irvine where he is doing research in Event Web and experiential computing. Earlier he served on faculty of Georgia Tech, University of California at San Diego, The university of Michigan, Ann Arbor, Wayne State University, and Indian Institute of Technology, Kharagpur. He is a Fellow of ACM, IEEE, AAAI, IAPR, and SPIE. His current research interests are in processing massive number of geo-spatial heterogeneous data streams for building Smart Social System. He is the recipient of several awards including the ACM SIGMM Technical Achievement Award 2010.

Event Mining in Social Multimedia"

Symeon Papadopoulos (Information Technologies Institute, Centre for Research and Technology Hellas, Greece).

Abstract. The presentation will discuss different approaches for social event detection on large collections of user-contributed multimedia content. Social events are defined as real-world events that are planned and attended by people and that are represented by media content captured by people attending them. Two main event detection settings will be presented: (a) a discovery scenario, where events of all types are of interest, and (b) an event detection scenario, where specific types (or classes) of events are sought. Approaches and insights will be presented for both settings, e.g. for the discovery of events in large media collections, as well as for the detection of events of given types. Supervised learning and clustering constitute the main components of these approaches. Several case studies and evaluation results will be presented using Flickr as the source of social media content. Notably, insights will be presented from the participation of the presenter in the two Social Event Detection contests (in the context of MediaEval ’11 and ’12), and an outline will be provided of pertinent research challenges and future work in this area.
Click here to download presentation handouts.

Dr. Symeon Papadopoulos received the Diploma degree in Electrical and Computer Engineering in the Aristotle University of Thessaloniki (AUTH), Greece in 2004. In 2006, he received the Professional Doctorate in Engineering (P.D.Eng.) from the Technical University of Eindhoven, the Netherlands. Since September 2006, he has been working as a researcher in the Informatics & Telematics Institute on a wide range of research areas such as information search and retrieval. In 2009, he completed a distance-learning MBA degree in the Blekinge Institute of Technology, Sweden. On 2012, he defended his PhD dissertation in the Informatics department of AUTH on the topic of large-scale knowledge discovery in social media content. His current research interests pertain to data mining and multimedia indexing on the Social Web.