Video Roadmap Part 4: Requirements

Part 1:  Introduction
Part 2:  Video Basics
Part 3:  Roadmap Description

The pace of collection technology improvements today far outstrips the capability to handle, manage and interrogate the data rapidly and accurately.  If we look individually at the video sources mentioned above we can state a few requirements specific to each source as follows:

Broadcast News: The requirements here stem from both commercial and government intelligence use of Broadcast News, in many cases foreign.  Broadcast News provides a ready source of what is called Open Source Intelligence (OSINT).  In commercial applications such intelligence is useful both US and international financial and commercial interests.  In the financial market place understanding rapidly changing financial trends as they are happening or anticipating them before they happen by examining precursor information is paramount to competitiveness.  The same is true in commercial markets by trying to understand and anticipate a competitor’s market position through insight into what is going on in his R&D laboratory by pasting together various, diffuse “tidbits” from open literature and scientific conferences.  The US government and other government intelligence apparatuses are equally interested in gleaning insight into other countries developments and situations by piecing together “tidbits” from diverse sources.  There is yet another burgeoning source of video information ripe and available from the myriad of internet-hosted social networks--MySpace, FaceBook, to name a few.  In the majority of these instantiations video is accompanied by audio, which makes this source ripe for multimedia processing and extraction.  The associated requirements are primarily associated with collecting, managing and retrieving from large data stores.   The need for rapid processing holds sway over real time processing.  There is also the need for translation and transcription capability to place the collected data into a native tongue, which is accomplished by character recognition for the written work and automatic speech recognition for the spoken word.

Meeting room:  This collection source primarily derives its requirements from commercial institutions with the government playing catch up.  Official records of board meetings, stockholder meetings and an as sundry of other transaction meetings are required as part of doing business or recognized at part of best practices.  As in Broadcast news these collections typically have an audio track associated with the video.  This source has all the requirements in common with Broadcast news for processing, storage and translation.

Surveillance: As defined surveillance also can be for commercial or government purposes.  In either case the interest is to detect intrusions or abnormal behavior and track those individuals as well as track normal behavior activities.  There are instances of exterior and perimeter surveillance of building and/or compounds.  These may be industrial complexes, port facilities and airports to name a few.  In such exterior applications day/night and all weather requirements drive the need for multimodal collection – panchromatic, infrared, microwave and radar sensors.  Thus, the processing requirements encompass the need to merge data from multiple multimodal sensors, process in real time so one can be proactive about an intrusion or anomalous behavior patterns.  In addition, the collectors’ fields of view may be overlapping thus requiring special process to take that into account.  In some instances the level of detail desired may involve object/person detection and recognition.  Recognition may be only human versus animal but may also as detailed as recognizing a person’s face.  Finally such surveillance scenarios find the video and data collected finally displayed in 3-D virtual world renderings.  The other category of surveillance falls within building interiors.  Here for security purposes one is interested in following the route of visitors, ensuring visitors or unauthorized individuals do not gain access to rooms/areas by tailgating or detecting abnormal behavior patterns of any individual within the building.  In other implementations found in large buildings (analogously in large outdoor facilities) such as airports there are multiple overlapping or non-overlapping cameras that require processing to “tag” and follow and keep track of “tagged” individuals.  In these cases there need be detailed information about an individual in order to place and maintain a tag.  In these cases high quality or high-definition video imagery is necessary for the processing algorithms to succeed.  Finally the identification, tagging and tracking is exacerbated by the varying ambient lighting conditions found in airports and other facilities.  For indoor situations the requirement is for processing high-definition imagery, again in real time to tag and track for anomalous behavior patterns.  Results to date also make a case for 3-D face recognition in addition to 3-D world renderings.  Particularly in high security related applications, such as counterterrorism, processing algorithms must display high precision and recall so as to avoid embarrassing or litigious situations by mistaken identity.

Overhead aerial surveillance:  A few years ago this topic was unmanned aerial vehicles but, again, because technology responded to the military battlefield need for wide-field, high-resolution video surveillance the topic has been expanded to also include sensors aboard manned aircraft.  Nonetheless, most people have heard of the US UAV programs’ adventures in Iraq and Afghanistan.  Their video collection capabilities and attendant processing needs pale when compared to the next generation capabilities on the military horizon – Angel Fire and Constant Hawk. These programs take battlefield surveillance to another level.  While capabilities are classified, hints that have made the open literature talk about massive amounts of data collected.  For example, in a 2007 report to Congress states, “Angel Fire will allow the warfighter to zoom in and observe more closely any area within the collected image cone, as well as allowing playback of significant events, essentially providing a ‘Google Earth, TIVO-like’ capability to monitor areas of interest.” [Department of the Air Force Presentation to the House Armed Services Committee, Subcommittee on Terrorism, Unconventional Threats and Capabilities, U.S. Houses of Representatives; Subject: Fiscal Year 2008 Air Force Science and Technology]  Another report to Congress described the Angle Fire as “a tactical situational awareness system that provides real-time, high-resolution (.5m), city-sized images (66 mega pixels) of infrastructure, vehicles and people to hundreds of users.” [Department of Defense Annual Report to Congress on Defense Acquisition Challenge Program for FY 2006 Deputy Under Secretary of Defense  (Advanced Systems and Concepts) June 2007; Subject: Project Angel Fire – Situational Awareness of Large-Area Urban Operations]  You can do the math.  More recent publications provide better insight into the current collection and anticipated collection around the corner.   According to a Wired the Angle Fire is currently flying missions, another sensor, Gordon Stare, is being flight tested and right behind these is the DARPA ARGUS-IS, which weighs in at 1.8 gigapixels on an airborne 20-hour duration platform. [Wired, Air Force to Unleash ‘Gorgon Stare’ on Squirting Insurgents & Special Forces’ Gigapixel Flying Spy Sees All, February 19, 2009]  In addition, the Department of Homeland Security has similar needs for domestic surveillance.  One common thread that all these systems share is a need for real time processing of the data with a resultant whose confidence level is very high.  While most of these collections systems have a human observer in the loop, it is easy for a human observer to overlook something in his/her field of view because concentration in restricted to a particular portion of the complete collection.  In order to defeat this shortcoming there is need for a software agent, trained in recognizing a number of objects and/or activities, which can “look over the observer’s shoulder” and observe the total scene to ensure certain things (objects, motion or activities) are not missed.

Ground reconnaissance:  The breadth of what is called ground reconnaissance is constantly widening and therefore the amount of daily data being collected is also increasing.  This video source is rapidly growing in both the commercial arena as well as the military.  Mentioned above was the phenomenal growth of postings on internet social networks.  The bulk of these postings are recorded by high quality video cameras or camera cell phones.  The ubiquity of image quality (read that as megapixels) recording devices makes video segments replace photographs.  This is particularly true in Eurasia, the Middle East, Africa and the Pacific Rim where 3rd and 4th generation cell phone networks are available that handle far more bandwidth than similar networks in the US.  Such social network postings are becoming an ever increasing source of OSINT that complements Broadcast News.  As such there is a need by the casual user to catalog and manage video segments as well as by intelligence communities to do the same but with more diligence.  In the latter case we find the same requirements for this source as we found for Broadcast News.  Another growing source of Ground Reconnaissance is military surveillance that has been deployed in Iraq and Afghanistan utilizing ground based sensors deployed on moving vehicles that produce a continuous hemisphere of video coverage.   These multi-camera systems pose additional processing requirements; that is to rectify the distorted images and seamlessly paste them together into a view that the human eye can comprehend and computer algorithms can treat as any other “normal perspective” video collection.  And, by the way, similar fixed hemispherical camera systems are turning up in the surveillance community.   Thus, Ground Reconnaissance shares many requirements with Broadcast News and Overhead Aerial Surveillance as technology drivers.