Version 6.04 (002)

Experimental system for visual search on broadcast archives

Active project

In today’s digital age, the ability to access, analyse and (re) use large amounts of data is a strategic asset of fundamental importance for the broadcast and media industry. The challenge lies in the ability to search, organise and access multimedia assets in a fast and semantically relevant way. Visual search technology is the new frontier for achieving these objectives. In fact, it allows users to search and match image and video contents depicting the same objects, such as buildings, paintings and logos, based on visual similarities and without the need of querying for manually generated metadata.

Visual Analysis relies on the idea of indexing and matching image and video contents based on visual characteristics, in addition to manually generated metadata. Many methods have been developed to achieve this goal, such as those for key-point feature detectors and descriptors. The Moving Picture Experts Group (MPEG) started in 2010 a standardisation initiative called Compact Descriptors for Visual Search (CDVS) that provides a robust and interoperable technology to create efficient visual search applications in image databases. Recently the interest is moving forward to the video domain with a new activity called Compact Descriptors for Video Analysis (CDVA). Intuitively, video analysis is a more challenging problem than still images due to temporal and spatial redundancy in video, which increases the amount of data that needs to be processed.

Based on visual search technology, several use cases can be defined. Among them, the following ones can be cited:

  1. Raw material identification. Given a query video scene (e.g. a piece of broadcasted content) we are interested in identifying the raw master video clips from which the scene was edited;
  2. Rigid object retrieval. Given a query video scene or image depicting a rigid object (e.g. monuments, sculptures, paintings, buildings, logos, etc.) and a target video database, we are interested at detecting as many occurrences of the query object over all videos in the target database as possible;
  3. Deformable object retrieval. Given a query image depicting a deformable object (e.g. a face, stage clothes) and a target video database, we are interested at matching as many occurrences of the query object over all videos in the target database as possible.

RAI CRIT is developing a farmework for visual content analysis and matching. The system is designed to strongly reduce video redundancy, thus significantly decreasing computational efforts. The current implementation integrates CDVS technology as a proof-of-concept (ISO/IEC 15938:13 and ISO/IEC 15938:14) for visual search. The retrieval architecture consists of the following three main components:

  • The summariser splits the analysed video in shots and extracts the key-frames;
  • The selector extracts CDVS descriptors from key-frames, gathers similar shots in clusters and performs key-frame ranking and cluster ranking by relevance;
  • The database stores information and metadata about video structure, ranking lists and visual descriptors.

The architecture works in two different modalities. In extraction mode, input videos are processed, in order to extract CDVS descriptors that will be stored and used as reference database for search and retrieval operations. In retrieval mode, CDVS descriptors are extracted  from a query video and used to search for similar contents in the reference database.

A first prototype of the system was implemented and tested. We collected a dataset of more than 20 hours of video material showing monuments, paintings, sculptures and historic locations of the Italian artistic and cultural heritage. Achieved accuracy demonstrated the system’s reliability and validity. This allows several practical applications, such as e.g. detection of points-of-interest (e.g., historical bridges, buildings, monuments) for intelligent shot framing and/or for opportunistic shooting.


MPEG, “ISO/IEC 15938:13 – Information technology – Multimedia content description interface – Part 13: Compact descriptors for visual search,” August 2015.

MPEG, “ISO/IEC 15938:14 – Information technology – Multimedia content description interface – Part 14: Reference software, conformance and usage guidelines for compact descriptors for visual search,” October 2015.