Artificial intelligence for video surveillance

Artificial intelligence for video surveillance utilizes computer software programs that analyze the audio and images from video surveillance cameras in order to recognize humans, vehicles, objects, attributes, and events. Security contractors program the software to define restricted areas within the camera's view (such as a fenced off area, a parking lot but not the sidewalk or public street outside the lot) and program for times of day (such as after the close of business) for the property being protected by the camera surveillance. The artificial intelligence ("A.I.") sends an alert if it detects a trespasser breaking the "rule" set that no person is allowed in that area during that time of day.^[1]

The A.I. program functions by using machine vision. Machine vision is a series of algorithms, or mathematical procedures, which work like a flow-chart or series of questions to compare the object seen with hundreds of thousands of stored reference images of humans in different postures, angles, positions and movements. The A.I. asks itself if the observed object moves like the reference images, whether it is approximately the same size height relative to width, if it has the characteristic two arms and two legs, if it moves with similar speed, and if it is vertical instead of horizontal. Many other questions are possible, such as the degree to which the object is reflective, the degree to which it is steady or vibrating, and the smoothness with which it moves. Combining all of the values from the various questions, an overall ranking is derived which gives the A.I. the probability that the object is or is not a human. If the value exceeds a limit that is set, then the alert is sent. It is characteristic of such programs that they are self-learning to a degree, learning, for example that humans or vehicles appear bigger in certain portions of the monitored image – those areas near the camera – than in other portions, those being the areas farthest from the camera.

In addition to the simple rule restricting humans or vehicles from certain areas at certain times of day, more complex rules can be set. The user of the system may wish to know if vehicles drive in one direction but not the other. Users may wish to know that there are more than a certain preset number of people within a particular area. The A.I. is capable of maintaining surveillance of hundreds of cameras simultaneously. Its ability to spot a trespasser in the distance or in rain or glare is superior to humans' ability to do so.

This type of A.I. for security is known as "rule-based" because a human programmer must set rules for all of the things for which the user wishes to be alerted. This is the most prevalent form of A.I. for security. Many video surveillance camera systems today include this type of A.I. capability. The hard-drive that houses the program can either be located in the cameras themselves or can be in a separate device that receives the input from the cameras.

A newer, non-rule based form of A.I. for security called "behavioral analytics" has been developed. This software is fully self-learning with no initial programming input by the user or security contractor. In this type of analytics, the A.I. learns what is normal behaviour for people, vehicles, machines, and the environment based on its own observation of patterns of various characteristics such as size, speed, reflectivity, color, grouping, vertical or horizontal orientation and so forth. The A.I. normalises the visual data, meaning that it classifies and tags the objects and patterns it observes, building up continuously refined definitions of what is normal or average behaviour for the various observed objects. After several weeks of learning in this fashion it can recognise when things break the pattern. When it observes such anomalies it sends an alert. For example, it is normal for cars to drive in the street. A car seen driving up onto a sidewalk would be an anomaly. If a fenced yard is normally empty at night, then a person entering that area would be an anomaly.

History

Statement of the problem

Limitations in the ability of humans to vigilantly monitor video surveillance live footage led to the demand for artificial intelligence that could better serve the task. Humans watching a single video monitor for more than twenty minutes lose 95% of their ability to maintain attention sufficient to discern significant events.^[2] With two monitors this is cut in half again.^[3] Given that many facilities have dozens or even hundreds of cameras, the task is clearly beyond human ability. In general, the camera views of empty hallways, storage facilities, parking lots or structures are exceedingly boring and thus attention quickly diminishes. When multiple cameras are monitored, typically employing a wall monitor or bank of monitors with split screen views and rotating every several seconds between one set of cameras and the next, the visual tedium is quickly overwhelming. While video surveillance cameras proliferated with great adoption by users ranging from car dealerships and shopping plazas to schools and businesses to highly secured facilities such as nuclear plants, it was recognized in hindsight that video surveillance by human officers (also called "operators") was impractical and ineffective. Extensive video surveillance systems were relegated to merely recording for possible forensic use to identify someone, after the fact of a theft, arson, attack or incident. Where wide angle camera views were employed, particularly for large outdoor areas, severe limitations were discovered even for this purpose due to insufficient resolution.^[4] In these cases it is impossible to identify the trespasser or perpetrator because their image is too tiny on the monitor.^{[citation needed]}

Earlier attempts at solution

Motion detection cameras

In response to the shortcomings of human guards to watch surveillance monitors long-term, the first solution was to add motion detectors to cameras. It was reasoned that an intruder's or perpetrator's motion would send an alert to the remote monitoring officer obviating the need for constant human vigilance. The problem was that in an outdoor environment there is constant motion or changes of pixels that comprise the total viewed image on screen. The motion of leaves on trees blowing in the wind, litter along the ground, insects, birds, dogs, shadows, headlights, sunbeams and so forth all comprise motion. This caused hundreds or even thousands of false alerts per day, rendering this solution inoperable except in indoor environments during times of non-operating hours.

Advanced video motion detection

The next evolution reduced false alerts to a degree but at the cost of complicated and time-consuming manual calibration. Here, changes of a target such as a person or vehicle relative to a fixed background are detected. Where the background changes seasonally or due to other changes, the reliability deteriorates over time. The economics of responding to too many false alerts again proved to be an obstacle and this solution was not sufficient.

Advent of true video analytics

Machine learning of visual recognition relates to patterns and their classification.^[5]^[6] True video analytics can distinguish the human form, vehicles and boats or selected objects from the general movement of all other objects and visual static or changes in pixels on the monitor. It does this by recognizing patterns. When the object of interest, for example a human, violates a preset rule, for example that the number of people shall not exceed zero in a pre-defined area during a defined time interval, then an alert is sent. A red rectangle or so-called "bounding box" will typically automatically follow the detected intruder, and a short video clip of this is sent as the alert.

Practical application

Pedestrian detection

Real-time preventative action

The detection of intruders using video surveillance has limitations based on economics and the nature of video cameras. Typically, cameras outdoors are set to a wide angle view and yet look out over a long distance. Frame rate per second and dynamic range to handle brightly lit areas and dimly lit ones further challenge the camera to actually be adequate to see a moving human intruder. At night, even in illuminated outdoor areas, a moving subject does not gather enough light per frame per second and so, unless quite close to the camera, will appear as a thin wisp or barely discernible ghost or completely invisible. Conditions of glare, partial obscuration, rain, snow, fog, and darkness all compound the problem. Even when a human is directed to look at the actual location on a monitor of a subject in these conditions, the subject will usually not be detected. The A.I. is able to impartially look at the entire image and all cameras' images simultaneously. Using statistical models of degrees of deviation from its learned pattern of what constitutes the human form it will detect an intruder with high reliability and a low false alert rate even in adverse conditions.^[7] Its learning is based on approximately a quarter million images of humans in various positions, angles, postures, and so forth.

A one megapixel camera with the onboard video analytics was able to detect a human at a distance of about 350' and an angle of view of about 30 degrees in non-ideal conditions. Rules could be set for a "virtual fence" or intrusion into a pre-defined area. Rules could be set for directional travel, object left behind, crowd formation and some other conditions. Artificial intelligence for video surveillance is widely used in China. See Mass surveillance in China.

Talk-down

One of the most powerful features of the system is that a human officer or operator, receiving an alert from the A.I., could immediately talk down over outdoor public address loudspeakers to the intruder. This had high deterrence value as most crimes are opportunistic and the risk of capture to the intruder becomes so pronounced when a live person is talking to them that they are very likely to desist from intrusion and to retreat. The security officer would describe the actions of the intruder so that the intruder had no doubt that a real person was watching them. The officer would announce that the intruder was breaking the law and that law enforcement was being contacted and that they were being video-recorded.^[8]

Verified breach report

The police receive a tremendous number of false alarms from burglar alarms. In fact the security industry reports that over 98% of such alarms are false ones. Accordingly, the police give very low priority response to burglar alarms and can take from twenty minutes to two hours to respond to the site. By contrast, the video analytic-detected crime is reported to the central monitoring officer, who verifies with his or her own eyes that it is a real crime in progress. He or she then dispatches to the police who give such calls their highest priority.

Behavioural analytics

Active environments

While rule-based video analytics worked economically and reliably for many security applications there are many situations in which it cannot work.^[9] For an indoor or outdoor area where no one belongs during certain times of day, for example overnight, or for areas where no one belongs at any time such as a cell tower, traditional rule-based analytics are perfectly appropriate. In the example of a cell tower the rare time that a service technician may need to access the area would simply require calling in with a pass-code to put the monitoring response "on test" or inactivated for the brief time the authorized person was there.

But there are many security needs in active environments in which hundreds or thousands of people belong all over the place all the time. For example, a college campus, an active factory, a hospital or any active operating facility. It is not possible to set rules that would discriminate between legitimate people and criminals or wrong-doers.

Overcoming the problem of active environments

Using behavioral analytics, a self-learning, non-rule-based A.I. takes the data from video cameras and continuously classifies objects and events that it sees. For example, a person crossing a street is one classification. A group of people is another classification. A vehicle is one classification, but with continued learning a public bus would be discriminated from a small truck and that from a motorcycle. With increasing sophistication, the system recognizes patterns in human behavior. For example, it might observe that individuals pass through a controlled access door one at a time. The door opens, the person presents their proximity card or tag, the person passes through and the door closes. This pattern of activity, observed repeatedly, forms a basis for what is normal in the view of the camera observing that scene. Now if an authorized person opens the door but a second "tail-gating" unauthorized person grabs the door before it closes and passes through, that is the sort of anomaly that would create an alert. This type of analysis is much more complex than the rule-based analytics. While the rule-based analytics work mainly to detect intruders into areas where no one is normally present at defined times of day, the behavioral analytics works where people are active to detect things that are out of the ordinary.

A fire breaking out outdoors would be an unusual event and would cause an alert, as would a rising cloud of smoke. Vehicles driving the wrong way into a one-way driveway would also typify the type of event that has a strong visual signature and would deviate from the repeatedly observed pattern of vehicles driving the correct one-way in the lane. Someone thrown to the ground by an attacker would be an unusual event that would likely cause an alert. This is situation-specific. So if the camera viewed a gymnasium where wrestling was practiced the A.I. would learn it is usual for one human to throw another to the ground, in which case it would not alert on this observation.

What the artificial intelligence 'understands'

The A.I. does not know or understand what a human is, or a fire, or a vehicle. It is simply finding characteristics of these things based on their size, shape, color, reflectivity, angle, orientation, motion, and so on. It then finds that the objects it has classified have typical patterns of behavior. For example, humans walk on sidewalks and sometimes on streets but they don't climb up the sides of buildings very often. Vehicles drive on streets but don't drive on sidewalks. Thus the anomalous behavior of someone scaling a building or a vehicle veering onto a sidewalk would trigger an alert.

Varies from traditional mindset of security systems

Typical alarm systems are designed to not miss true positives (real crime events) and to have as low of a false alarm rate as possible. In that regard, burglar alarms miss very few true positives but have a very high false alarm rate even in the controlled indoor environment. Motion detecting cameras miss some true positives but are plagued with overwhelming false alarms in an outdoor environment. Rule-based analytics reliably detect most true positives and have a low rate of false positives but cannot perform in active environments, only in empty ones. Also they are limited to the simple discrimination of whether an intruder is present or not.

Something as complex or subtle as a fight breaking out or an employee breaking a safety procedure is not possible for a rule based analytics to detect or discriminate. With behavioral analytics, it is. Places where people are moving and working do not present a problem. However, the A.I. may spot many things that appear anomalous but are innocent in nature. For example, if students at a campus walk on a plaza, that will be learned as normal. If a couple of students decided to carry a large sheet outdoors flapping in the wind, that might indeed trigger an alert. The monitoring officer would be alerted to look at his or her monitor and would see that the event is not a threat and would then ignore it. The degree of deviation from norm that triggers an alert can be set so that only the most abnormal things are reported. However, this still constitutes a new way of human and A.I. interaction not typified by the traditional alarm industry mindset. This is because there will be many false alarms that may nevertheless be valuable to send to a human officer who can quickly look and determine if the scene requires a response. In this sense, it is a "tap on the shoulder" from the A.I. to have the human look at something.

Limitations of behavioral analytics

Because so many complex things are being processed continuously, the software samples down to the very low resolution of only 1 CIF to conserve computational demand. The 1 CIF resolution means that an object the size of a human will not be detected if the camera utilized is wide angle and the human is more than sixty to eighty feet distant depending on conditions. Larger objects like vehicles or smoke would be detectable at greater distances.

Quantification of situational awareness

The utility of artificial intelligence for security does not exist in a vacuum, and its development was not driven by purely academic or scientific study. Rather, it is addressed to real-world needs, and hence, economic forces. Its use for non-security applications such as operational efficiency, shopper heat-mapping of display areas (meaning how many people are in a certain area in retail space), and attendance at classes are developing uses.^[10] Humans are not as well qualified as A.I. to compile and recognize patterns consisting of very large data sets requiring simultaneous calculations in multiple remote viewed locations. There is nothing natively human about such awareness. Such multitasking has been shown to defocus human attention and performance. A.I.s have the ability to handle such data. For the purposes of security interacting with video cameras they functionally have better visual acuity than humans or the machine approximation to it. For judging subtleties of behaviors or intentions of subjects or degrees of threat, humans remain far superior at the present state of the technology. So the A.I. in security functions to broadly scan beyond human capability and to vet the data to a first level of sorting of relevance and to alert the human officer who then takes over the function of assessment and response.

Security in the practical world is economically determined so that the expenditure of preventative security will never typically exceed the perceived cost of the risk to be avoided. Studies have shown that companies typically only spend about one twenty-fifth the amount on security that their actual losses cost them.^[11]^{[predatory publisher]} What by pure economic theory should be an equivalence or homeostasis, thus falls vastly short of it. One theory that explains this is cognitive dissonance, or the ease with which unpleasant things like risk can be shunted from the conscious mind. Nevertheless, security is a major expenditure, and comparison of the costs of different means of security is always foremost amongst security professionals.

Another reason that future security threats or losses are under-assessed is that often only the direct cost of a potential loss is considered instead of the spectrum of consequential losses that are concomitantly experienced. For example, the vandalism-destruction of a custom production machine in a factory or of a refrigerated tractor-trailer would result in a long replacement time during which customers could not be served, resulting in loss of their business. A violent crime will have extensive public relations damage for an employer, beyond the direct liability for failing to protect the employee.

Behavioral analytics uniquely functions beyond simple security and, due to its ability to observe breaches in standard patterns of protocols, it can effectively find unsafe acts of employees that may result in workers comp or public liability incidents. Here too, the assessment of future incidents' costs falls short of the reality. A study by Liberty Mutual Insurance Company showed that the cost to employers is about six times the direct insured cost, since uninsured costs of consequential damages include temporary replacement workers, hiring costs for replacements, training costs, managers' time in reports or court, adverse morale on other workers, and effect on customer and public relations.^[12] The potential of A.I. in the form of behavioral analytics to proactively intercept and prevent such incidents is significant.

References

^ "Video Analytics - an overview | ScienceDirect Topics". www.sciencedirect.com. Retrieved 2020-11-01.
^ Green, Mary W. (1999) The Appropriate and Effective Use of Security Technologies in U.S. Schools, A Guide for Schools and Law Enforcement Agencies, Sandia National Laboratories
^ Sulman, N.; Sanocki, T.; Goldgof, D.; Kasturi, R., How effective is human video surveillance performance?, Pattern Recognition, ICPR 2008. 19th International Conference on, vol., no., pp.1,3, 8-11 Dec. 2008
^ Nuechterlein, K.H., Parasuraman, R., & Jiang, Q. (1983). Visual sustained attention: Image degradation produces rapid sensitivity decrement over time. Science, 220, 327-329
^ Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World, September 22, 2015 Basic Books
^ Davies, E. R. (2012) Computer and Machine Vision, Fourth Edition: Theory, Algorithms, Practicalities Academic Press, Waltham Mass.
^ Dufour, Jean-Yves, Intelligent Video Surveillance Systems, John Wiley Publisher (2012)
^ Hantman, Ken (2014) What is Video Analytics, Simply Explained
^ Rice, Derek, Finding & Selling The Value of Analytics, SDM Magazine (Sept 2015) BNP Media II, Troy Michigan
^ Gruber, Illy, The Evolution of Video Analytics, Security Sales & Integration magazine (August 11, 2012) Security Sales & Integration, Framingham MA
^ Bressler, Martin S., The Impact of Crime on Business: A Model of Prevention, Detection & Remedy, Journal of Management and Marketing Research (2009)
^ Safety Index Report, Liberty Mutual Insurance Company (2002)

[1] "Video Analytics - an overview | ScienceDirect Topics". www.sciencedirect.com. Retrieved 2020-11-01.

[2] Green, Mary W. (1999) The Appropriate and Effective Use of Security Technologies in U.S. Schools, A Guide for Schools and Law Enforcement Agencies, Sandia National Laboratories

[3] Sulman, N.; Sanocki, T.; Goldgof, D.; Kasturi, R., How effective is human video surveillance performance?, Pattern Recognition, ICPR 2008. 19th International Conference on, vol., no., pp.1,3, 8-11 Dec. 2008

[4] Nuechterlein, K.H., Parasuraman, R., & Jiang, Q. (1983). Visual sustained attention: Image degradation produces rapid sensitivity decrement over time. Science, 220, 327-329

[5] Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World, September 22, 2015 Basic Books

[6] Davies, E. R. (2012) Computer and Machine Vision, Fourth Edition: Theory, Algorithms, Practicalities Academic Press, Waltham Mass.

[7] Dufour, Jean-Yves, Intelligent Video Surveillance Systems, John Wiley Publisher (2012)

[8] Hantman, Ken (2014) What is Video Analytics, Simply Explained

[9] Rice, Derek, Finding & Selling The Value of Analytics, SDM Magazine (Sept 2015) BNP Media II, Troy Michigan

[10] Gruber, Illy, The Evolution of Video Analytics, Security Sales & Integration magazine (August 11, 2012) Security Sales & Integration, Framingham MA

[11] Bressler, Martin S., The Impact of Crime on Business: A Model of Prevention, Detection & Remedy, Journal of Management and Marketing Research (2009)

[12] Safety Index Report, Liberty Mutual Insurance Company (2002)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]