Google Clips Camera Tested - Weak AIBy Rob Kilpatrick, Published on Mar 12, 2018
The Google Clips is drawing a lot of interest, especially given its use of artificial intelligence (AI) to deliver a 'smart' camera. Indeed, Google claims:
We purchased one and tested these claims. Inside, we report our findings on:
- Camera pricing
- App usability
- Inconsistent suggested clips
- Face/pet/expression recognition issues
- Camera hardware
Most notably, we examine how the camera was not very 'smart' at all.
However, in our tests, we saw several issues contradicting this claim:
- Clips were generated by practically any sort of motion (people, pets, etc.) or even in still scenes (empty rooms, walls, etc.).
- Further, while it claims to suggest clips that users will like most, these suggestions had seemingly no rhyme or reason, with nearly identical clips suggested and not suggested directly after each other.
- Finally, while Google also claims that expression recognition is used to determine which clips to capture, we saw no difference in the number of captures regardless of expression (smiling, laughing, neutral, etc.).
Despite these issues, many users may still find the camera useful, as it may be placed in a stationary location during events or when playing with children or pets, automatically taking photos without them needing to use their phones.
Note: Claims To Improve Over Time
Google claims that Clips improve over time, with the camera learning what is important to users. We saw no difference in performance during our testing, ~5 days. However, it may improve with additional time and use.
The Google Clips camera can be purchased from the Google Store for $249 USD, or as a bundle with a tripod mount case for an additional $14.99.
The video below provides a physical overview of the google clips camera and included case:
Clips Mobile App Overview
The Clips mobile app is essentially the only UI to the camera. The main interface of the app is a list of captured clips, which users scroll through to review, swipe to save/delete, etc., and so-called "Suggested Clips" are highlighted with a star icon, though we found these suggestions poor (discussed below). The camera may also be viewed live for positioning and switched from video to gif mode, etc.
We review these functions in this video:
Failure: Clips Generated On Anything/Nothing
According to Google, clips should be created when the camera sees human faces and pets. However, during our tests, the camera created several clips which contained none of these, including an empty room, object waved in front of the camera, people without faces visible, etc.
Because this feature worked so inconsistently, the Clips camera functions more like a camera recording on motion might, recording when it perceives changes in the scene.
For example, while we were writing this report, the camera captured six images of the side of a file cabinet:
Note that there is no way to verify what the camera "saw" to trigger a clip, nor is there any clear sensitivity adjustment.
Inconsistent Suggested Clips
Google states that the camera suggests specific clips based on recognizing familiar faces or pets, which it then highlights with a star icon in the top right corner of the clip. However, in our tests, this was incredibly inconsistent. The app suggested clips, but others immediately before and after those suggested were not tagged, despite containing the same subject(s) and scene.
For example, the image below shows two clips taken back to back, with one recommended and one not, but the content nearly identical.
The same was true of clips with pets. The clip below left was recommended, but the very similar clip on the right was not.
Failed Expression Recognition
Google claims that facial expressions are one criteria which it uses to capture clips. However, in our tests, clips were generated regardless of expression. Users smiling, laughing, frowning, etc., were just as likely to trigger a clip as those with neutral expressions.
In the video below we review components of the Clips camera. The camera's AI is powered by a Movidius MA2150 "Vision Processing Unit" (VPU), visible on the front of the board (see Intel Movidius Targets Video Surveillance Market). The only other chip with visible markings is its 16GB solid-state storage, on the rear of the board.
Note that in order to tear down the camera, it must be broken, with no way to remove and replace the lens or front cover and delicate ribbon cables connecting components. The teardown was conducted after performance testing.
Note that our tests were performed in areas with few subjects present, with activity ranging from sporadic to constant. If the camera were used in an area with a crowd of people, such as a party, performance may differ (known people suggested instead of unknown, fewer or more clips shown, etc.), though given how much it struggled with simple scenes, we are skeptical of how it would perform in harder ones.
Firmware Versions Used
The following firmware versions were used during testing:
- App version: 1.3.185005366
- Camera Version: 220.127.116.115005431