John H.L. Hansen
THE UNIVERSITY OF TEXAS AT DALLAS, U.S.A.
Speech & Speaker Variability: Assessing Who, What, Where and How from Earth to the Moon
There is significant interest in the development of effective human-machine interactive systems for monitoring and assessing human communications, social interactions, education, and team-based collaborative task management. The availability of speech data for use in assessing human context is growing with a wider range of speech, language, and personal services (i.e., SmartPhones, child language development, security, human monitoring, search & retrieval). While Speech and Speaker Recognition research has advanced significantly in recent years, performance in real environments remains a major challenge. In this talk, we consider recent research efforts in the field of speech, speaker, and environment modeling for extracting knowledge within “BIG DATA”. We provide a brief overview of CRSS research activities with emphasis on two main parts: (i) modeling and characterizing speech data in naturalistic settings using the Prof-Life-Log corpus; and (ii) characterizing massive parallel audio data from the entire NASA Mission Control operations during the Apollo-11 lunar mission (i.e., +12,000hrs of data). Understanding differences in speech production including variability due to vocal effort (e.g., whisper, soft, neutral, loud, shout), Lombard Effect (speech produced in noise), will be explored. Finally we explore robust speech and speaker recognition advancements in monitoring human interaction for BIG DATA.
John H.L. Hansen, received the Ph.D. & M.S. degrees in Electrical Engineering from Georgia Institute of Technology, and B.S.E.E. degree from Rutgers Univ., College of Engineering, N.J. He joined Univ. of Texas at Dallas (UTDallas), Erik Jonsson School of Engineering & Computer Science in 2005, where he is Associate Dean for Research, Professor of Electrical Engineering, Distinguished Univ. Chair in Telecommunications Engineering, and holds a joint appointment in the School of Behavioral & Brain Sciences (Speech & Hearing). At UTDallas, he established The Center for Robust Speech Systems (CRSS). He is an ISCA Fellow, IEEE Fellow, and has served as Member and TC Chair of IEEE Signal Processing Society Speech & Language Processing Technical Committee (SLTC), ISCA Distinguished Lecturer, and previously served as Technical Advisor to the U.S. Delegate for NATO (IST/TG-01), Associate Editor for IEEE Trans. Speech & Audio Processing, Associate Editor for IEEE Signal Processing Letters, Editorial Board Member for the IEEE Signal Processing Magazine. He is currently serving as ISCA Vice-President and Member of the ISCA Board. He has supervised 73 PhD/MS thesis candidates, was recipient of 2005 University of Colorado Teacher Recognition Award, and author/co-author of 581 journal and conference papers, including 11 books in the field of speech processing and language technology. He served as General Chair and Organizer for Interspeech-2002 (Denver, CO), Co-Organizer and Technical Program Chair for IEEE ICASSP-2010 (Dallas, TX), and Co-General Chair and Organizer for IEEE Workshop on Spoken Language Technology (SLT-2014), (Lake Tahoe, NV).
Lars Kai Hansen
TECHNICAL UNIVERSITY OF DENMARK
Sensing the Deep Structure of Signals
Deep learning has emerged as a powerful paradigm inspired by natural cognitive systems. While we still have many open questions related to representation, optimality and maintenance, the three essential ingredients of neural networks: Distributed processing, specialization and learning are all quite well understood. In the talk I will review work on understanding the interplay between distribution and specialization in information processing. In particular I will focus on the cognitive component hypothesis concerned with the mechanisms that allow the human brain to solve complex perceptual and cognitive tasks. The hypothesis is an example of the so-called ‘rational’ approach to cognitive modeling which focuses on the statistical properties of the senses and the brain and the computational challenges that they face in a given environment. The cognitive component hypothesis emphasizes the important role of conditional independence, enabling the brain to simplify computations and focus attention only on relevant dimensions of given environment. I will describe the supervised and unsupervised machine learning methods we have used to design protocols for testing of the cognitive component hypothesis. The approach is illustrated by examples ranging from modeling of low-level properties of speech and music, to understanding of high-level aspects of human cognition involved in social behaviors and multi-modal information retrieval.
Professor Hansen has M.Sc.(’83) and Ph.D.(’86) degrees in physics from University of Copenhagen. Since 1990 he has been with the Technical University of Denmark, where he currently heads the Section for Cognitive Systems. He has made more than 300 contributions to machine learning, signal processing, and applications in bio-medicine and digital media. His research has been generously funded by the Danish Research Councils and private foundations, the European Union, and the US National Institutes of Health. He has made seminal contributions to machine learning including the introduction of ensemble methods(’90) and to functional neuroimaging including the first brain state decoding work based on PET(’94) and fMRI(’97). Recent work concerns the relations between the brain and its natural environment: The smartphone brain scanner (’11). In 2011 he was elected “Catedra de Excelencia” at UC3M Madrid, Spain.
Roger K. Moore
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF SHEFFIELD, U.K.
Vocal Interaction with ‘Intelligent’ Machines: Are we There Yet?
Recent years have seen steady improvements in the quality and performance of voice-based human-machine interaction driven by a significant convergence in the methods and techniques employed. Spoken language processing has finally emerged from the research laboratory into the real world, and members of the general public now regularly encounter speech-enabled services and devices while going about their daily lives. Does this mean that our job as speech technology researchers is finally done? Apparently not – evidence from real users suggests that the capabilities of contemporary spoken language systems continue to fall short of what they expect and the market needs. We still seem to be some way off creating Autonomous Social Agents that are truly capable of conversing effectively with their human counterparts in real world situations. This talk will address these issues and will argue that we need to go far beyond our current capabilities and understanding if we are to move from developing machines that simply talk and listen to evolving ‘intelligent’ communicative devices that are capable of entering into productive cooperative interactive relationships with human beings.
Prof. Moore has over 40 years’ experience in Speech Technology R&D and, although an engineer by training, much of his research has been based on insights from human speech perception and production. As Head of the UK Government’s Speech Research Unit from 1985 to 1999, he was responsible for the development of the Aurix range of speech technology products and the subsequent formation of 20/20 Speech Ltd. Since 2004 he has been Professor of Spoken Language Processing at the University of Sheffield, and also holds Visiting Chairs at Bristol Robotics Laboratory and University College London Psychology & Language Sciences. Prof. Moore has authored and co-authored over 200 scientific publications in the general area of speech technology applications, algorithms and assessment. He is Editor-in-Chief of Computer Speech and Language and a member of the Editorial board for Speech Communication. Prof. Moore served as President of the European Speech Communication Association (ESCA) and the International Speech Communication Association (ISCA) from 1997 to 2001, and as President of the Permanent Council of the International Conferences on Spoken Language Processing (PC-ICSLP) from 1996 to 2001. In 1994 Prof. Moore was awarded the prestigious UK Institute of Acoustics Tyndall medal for “distinguished work in the field of speech research and technology” and in 1999 he was presented with the NATO RTO Scientific Achievement Award for “repeated contribution in scientific and technological cooperation”. In 2008 he was elected as one of the first ISCA Fellows “in recognition of his applications of human speech perception and production models to speech technologies and his service to ISCA as President”, and in 2014-15 he was selected to be one of ISCA’s Distinguished Lecturers. Prof. Moore was General Chair for INTERSPEECH 2009.
FACULTY OF ELECTRICAL ENGINEERING AND COMPUTING
UNIVERSITY OF ZAGREB, CROATIA
De-identification for Privacy Protection – COST Action IC1206 Objectives and Achievements
Privacy is one of the most important social and political issues in our information society, characterized by a growing range of enabling and supporting technologies and services. Amongst these are communications, multimedia, biometrics, big data, cloud computing, data mining, internet, social networks, and audio-video surveillance. Each of these can potentially provide the means for privacy intrusion. De-identification is one of the main approaches to privacy protection in multimedia contents. It is a process for concealing or removing personal identifiers, or replacing them by surrogate personal identifiers in personal information in order to prevent the disclosure and use of data for purposes unrelated to the purpose for which the information was originally obtained. Main objectives of COST Action IC1206 are: sharing of knowledge and technology among experts in the fields related to automated de-identification for privacy protection in multimedia contents; define taxonomy of the personal identifiable information; provide innovative solutions for concealing, or removal of identifiers while preserving data utility and naturalness; de-identification of non-biometric identifiers (text, hairstyle, dressing style, license plates) de-identification of physiological (face, fingerprint, iris, ear), behavioural (voice, gait and gesture) and soft-biometric (body silhouette, gender, age, race, tattoo) identifiers in multimedia documents. In this short presentation some achievements of the Action, during a three-year period of activities will be presented.
Slobodan Ribarić, Ph.D., is a Full Professor at the Department of Electronics, Microelectronics, Computer and Intelligent Systems, Faculty of Electrical Engineering and Computing (FER), University of Zagreb. S. Ribarić is the head of the Laboratory of Pattern Recognition and Biometric Security Systems. He received the B.S. degree in electronics, the M.S. degree in automatics, and the PhD degree in electrical engineering from the Faculty of Electrical Engineering, Ljubljana, Slovenia, in 1974, 1976, and 1982, respectively. His research interests include Pattern Recognition, Biometrics, Computer Architecture and Robot Vision. He has published more than one hundred and fifty papers on these topics. Some articles have been published in the leading scientific journals such as IEEE Transactions on Ind. Electronics, IEEE Transactions on Pattern Analysis and Machine Intelligence, Microprocessing and Microprogramming, Pattern Recognition, and Applied Artificial Intelligence. Ribarić is the author of five books: Microprocessor Architecture (1982), The Fifth Computer Generation Architecture (1986), Advanced Microprocessor Architectures (1990), CISC and RISC Computer Architecture (1996), Computer Structures, Architecture and Organization of Computer Systems (2011), and a co-author of the book An Introduction to Pattern Recognition (1988). In 2013, he received the Gold medal “Josip Lončar” awarded by FER for his outstanding contribution to the Faculty. Prof. Ribarić is a chair of IC1206 COST Action “De-identification for privacy protection in multimedia content”. He is a member of Editorial Board of CIT Journal. Ribarić is a member of the IEEE and MIPRO.
TECHNOLOGISK INSTITUTE, DENMARK
Deploying Emerging Mobile Service Robots – lots of opportunitues but also barriers
A new breed of smaller and affordable mobile service platforms is at an increasing pace entering the market creating new opportunities for novel types of flexible automation solutions in both existing and previously non-automated domains. The talk will center on these opportunities from a system’s perspective and exemplify how the combination of modular concepts and advanced HRI can be a critical enabler. The talk will also focus on several of the barriers that do exist when designing and implementing real-world solutions using these technologies, and offer some insights of how to overcome them.
Lars Dalgaard has worked with robot technology for more than 19 years as entrepreneur, teacher, lecturer, researcher, consultant, and since 2014 as Head of Service Robotics at the Danish Technological Institute, Robot Technology. Technology-wise his focus is on flexible mobile robots for logistics in industry and healthcare, and on civil deployment of autonomous drones for inspection and manipulation. Solution-wise his focus is on system-level design where the value proposition is created in the intersection between the technical, organisational, human, and economic needs and demands – a domain where Lars has strong research and commercial level expertise. Through the years, Lars has worked with robot technology in all shapes and sizes spanning from field robots and mobile robots for nursery gardens, over self-reconfigurable modular robots and modular robot technology for dynamic layout changes in pig stables, to butler robots and interactive room barriers.