Theme 1: Audio-Visual Content Extraction & Interaction
Theme Leader: Langis Gagnon, Ph.D.
Research Contributors: CRIM, Université Laval (Prof. Denis Laurendeau), Université de Montréal (Prof. James Turner)
The theme 1 aims at developing software tools for the automatic extraction of audio-visual content in video documents to facilitate content accessibility to blind and hard of hearing people and indexing of audio-visual archives. The theme projects are:
-
Project 1.1: Smart Captioning
This project aims at providing deaf and hearing impaired people with adaptive rendering of captions in order to ease their reading and facilitate their understanding. Adaptation will be done on the base of the visual action and presence of character faces. A study will be performed with deaf and hearing impaired people to measure the ease of reading and the information retention level of concomitant video and text. Viewers' testing will be conducted with eye tracking movement followed by evaluation of information retention. Software tools will be developped to automatically adapt the display of captions within a video without the viewers being confused or losing important chunk of information. RECENT PRESENTATION
-
Project 1.2: Audio-visual content encoding and computer-assisted video description
This project aims at developing feature extraction and semantic interpretation tools to computer-assist the generation of video description for blind people. Tools to be developed will include scene segmentation, people activity and gesture recognition, local motion spotting and description, camera motion characterization, automatic cast summarization, etc. Audio-visual encoders will be developped as plug-ins of an open-source video editing tools like VirtualDub and AviSynth. This project also targets the creation of guidelines for those who produce described movies and television programs. RECENT PRESENTATION 1; RECENT PRESENTATION 2
-
Project 1.3: Enhanced talking Web browser
The goal of this project is to explore the possibility to enhance a Web site which contains large visual content (images and video) with accessibility visual descriptors automatically integrated into a Web markup language. We want to develop an adapted version of a Web browser supporting self-voicing to allow blind users to select desired descriptors and browse the visual information, based on the type (according to the typology developped by us) and level of video description. RECENT PRESENTATION; Experimental site for adaptive/accessible videodescription with symthetic voice (in French)
-
Project 1.4: Reading mobile camera
The goal of project 1.4 is to adapt the Optical Character Recognition (OCR) algorithm designed in Project 1.2 for text detection in films to help blind people process and understand textual information from their surrounding environment while walking down the street using live mobile OCR.
-
Project 1.5: Visual captioning at play
This project aims at providing an alternative to traditional hockey subtitling for hockey game TV watching. We aim to develop a computer vision tool that can detect the presence of a player only based on the recognition of his jersey number and display the player's name directly on the screen.
Software tools under developement :
- Synchronization software (beta version): For data acquisition from eye-tracker, video player and game pad for cognitive eye-tracking analysis
- Smart Captioning (proof of concept): For positioning captions based on face, text and motion detection
- Video Ground Truth maker (beta version): For labeling videos content for performance measures of automatic indexing tools
- Video Description manager (prototype; beta version underway): For coordinating all audio-visual content extraction modules (shot, faces, text, motion, places, etc.) and generating video description summaries
- Adaptive Video Description player (prototype; beta version underway): For allowing selection of different Video Description levels according to user preference
| Attachment | Size |
|---|---|
| Project map of Theme 1 | 51.06 KB |
| Screen captures of software tools | 191.43 KB |
