Audio Analytics Aggression Tested

By Ethan Ace, Published Nov 20, 2015, 12:00am EST (Research)

What if you could use your IP cameras to detect fights before they start? 

That is the goal of Louroe / Sound Intelligence with their recently released Aggression Detector audio analytics. Claiming that "90% of physical aggression are preceded by verbal aggression", these analytics are designed to alert guards before verbal altercations turn physical.

We tested the Aggression Detector using both live subjects and recorded clips of multiple scenes, to see just how these analytics perform.

Here is the app main view that we used to monitor and optimize the audio analytics:

Louroe/Sound Intelligence Aggression Detector analytics are a potentially useful addition to surveillance systems, notifying staff of events which may otherwise go unnoticed, possibly before fights/altercations escalate. 

However, they are not suitable for all locations, and are best used in normally quiet areas, as crowds or gathered students will trigger numerous false alerts, while high background noise levels reduce analytics' ability to distinguish speech. 

Key Findings

In our tests, Louroe's Aggression Analytics experienced no false activations over the course of three days in typical open office area (from normal voice conversations, louder yelling down the aisle to other workers, or group discussions).

However, some non-aggressive speech patterns caused false alerts to varying degrees:

  • Group/crowd noise: Large groups and crowds triggered near constant false alerts, regardless of sensitivity. Louroe recommends not using aggression analytics in these applications.
  • Loud, non-aggressive speech: In some cases, louder non-aggressive speech, joking, etc., triggered alarms. This was uncommon but occurred a few times (<5) during live testing

Detection requires continuous stressed speech to analyze and trigger. Short shouts or loud noises never activated analytics during our tests.

Louroe verifies their own as well as Axis microphones for best performance. Internal mics are not recommended and performed poorly in our tests, with high background noise levels drastically reducing detection performance. 

No direct VMS event integration. TCP serial strings or digital I/O may be used.

Pricing

MSRP for Louroe's aggression detection is $550 per channel.

Potential Applications

The best fit for these aggression analytics is in normally quiet areas where aggressive speech most easily stands out against background noise, including offices, hospitals, police stations, courtrooms, etc.

These analytics could be useful in schools, but the increased volume and crowd noise in common areas before/after or in between classes will likely trigger numerous false alerts. Event rules may be set outside of these time periods, but this would largely negate the benefits of the analytic.

In open areas where larger groups may gather, such as stadiums, false alerts are likely to be near constant due to crowd noise.

Finally, in wide open areas such as streets or parks, mics are simply unlikely to properly detect speech and aggression due to the increased range and background noises, making the analytics unsuitable.

Installation/Configuration

Louroe's Aggression Detector has a single configuration screen, with only one analytic setting: sensitivity, highlighted in red below. This screen also shows real time audio activity in three graphs:

  1. Audio level activity: The top graph shows audio activity, with low volume audio displayed as green, turning to yellow when above the detection threshold, to orange when analyzed, and finally red when aggression is detected. While aggression is active, the background also changes from black to faint red.
  2. Audio spectrogram: The middle graph displays frequency analysis, with higher powered frequencies displayed as red for high energy, yellow for mid, and blue for low.
  3. Classification: Finally, at the bottom, analytic classification is shown. Colors range from black (background noise only) to blue (unclassified), then to green for non-stressed human speech, and yellow/orange/red for increasing levels of stress/aggression.

We strongly recommend monitoring performance and adjusting sensitivity over the course of multiple days in order to properly gauge false alerts/missed activations. Sensitivity settings also will vary depending on the microphone used, gain settings applied, and background noise present at the time of calibration.

Aside from these settings, users simply adjust gain and CODEC settings. We review these options in more detail, including VMS specific information, in our Audio Surveillance Guide.

Video Explanation

The video below reviews installation and configuration in more detail, including sensitivity settings, mic setup, and more:


VMS Integration

Louroe's analytics did not integrate with any of the VMSes tested: Avigilon, Genetec, Exacq, and XProtect. Similar to other ACAP applications, they may be used to trigger TCP notifications or relay outputs in the camera, which may be manually integrated to the VMS.

If these methods are not available, users may also receive email/SMS messages for alerts.

Audio Analysis Examples

The examples below show analytic analysis in several applications, both live and from recorded clips.

Note that Louroe recommends clips not be used for configuration purposes, as the frequency response of the recording microphone and speakers playing the clip back are unknown and may impact detection performance. However, we used clips for these examples simply as a way to repeatedly show the effects of settings changes.

Fist Fight and Crowd

First, this example shows analysis of a fight with a crowd of spectators. Volume levels are high, shown by the large amounts of orange in the activity window and red frequencies shown in the spectrogram. Aggression alerts were frequent, but not constant in this example:

Loud, Non-Aggressive Speech

Next, this example shows loud, but not aggressive speech. Note that analysis occurs near constantly, as the activity window shows all orange. However, despite increased levels, the speech is not analyzed as stressed.

Stadium Crowd Noise

Finally, this example shows stadium crowd noise on low sensitivity. Even with lowered volume and reduced sensitivity, alarms are nearly constant:

Test Parameters

Test was performed using an Axis Q1615 with Louroe Verifact A mic via IF-1 Audio adapter.

The following firmware/software versions were used in this test:

  • Louroe/Sound Intelligence Aggression Detector: 2.2.4-Louroe
  • Axis Q1615: 5.80.1.2
  • ExacqVision: 7.2.1.85489
  • Genetec Security Center: 5.3.1417.47
  • Milestone XProtect: 2014 (8.6d)
Comments (8) : Subscribers only. Login. or Join.
Loading Related Reports