simon_bisson
Contributor

Working with the Azure Kinect Developer Kit

analysis
Jun 16, 20207 mins
Machine LearningMicrosoft AzureSoftware Development

Building applications on top of the Kinect depth sensor

azure kinect 02 lg
Credit: IDG

Microsoft announced its Azure Kinect camera modules alongside HoloLens 2 early in 2019. Both devices use the same mixed-reality camera module, using a time-of-flight depth sensor to map objects around the camera. But where HoloLens is a wearable mixed-reality device, the Azure Kinect modules are intended to provide Azure-hosted machine learning applications with connected sensors that can be mounted anywhere in a workspace.

Azure Kinect is a direct descendent of the second-generation Kinect modules that shipped with the Xbox One, but instead of providing real-world inputs for gaming, it’s targeted at enterprise users and applications. Intended to work with Azure’s Cognitive Services, the first Azure Kinect developer kit started shipping at the end of 2019 in the United States, adding several other countries in early 2020.

Opening the box

The $399 Azure Kinect Developer Kit is a small white unit with two camera lenses, one for a wide-angle RGB camera and one for the Kinect depth sensor, and an array of microphones. It has an orientation sensor, allowing you to use the camera to build complex 3-D images of environments, ready for use in mixed reality. You can chain multiple devices together for quick 3-D scans or to provide coverage of an entire room, using the orientation sensor to help understand device position.

Along with the camera unit, you get a power supply, an Allen key to remove the chaining ports cover, and a USB cable to connect to a development PC. I’d recommend getting a desktop tripod or another type of mount, as the bundled plastic stand is rather small and doesn’t work with most desks or monitors. There’s no software in the box, only a link to online documentation where you can download the device SDK.

Before you get started, you should update the device firmware. This ships with the SDK and includes a command line installation tool. When you run the updater it first checks the current firmware state before installing camera and device firmware and then rebooting. Once the camera has rebooted, use the same tool to check that the update has installed successfully. If there’s a problem with an install you can use the camera’s hardware reset (hidden under the tripod mount) to restore the original factory image.

Sensing the world

With the SDK installed you get access to the device sensors from your code. There are three SDKs: one for low-level access to all the camera’s sensors, another to use the familiar Kinect body-tracking features, and one to link the camera’s microphone array to Azure’s speech services. A prebuilt Kinect Viewer app shows the available camera views and streams data from the device’s sensors. You get access to the wide-angle RGB camera, a depth camera view, and the image from the depth sensor’s infrared camera. SDKs are available for both Windows and for Linux, specifically Canonical’s Ubuntu 18.04 LTS release, and can be downloaded directly from Microsoft or from GitHub.

It’s a good idea to spend some time playing with the Kinect Viewer. It lets you see how the different depth camera modes operate, helping you choose either a narrow or wide field of view. You can see data from the position sensors, both the accelerometer and gyroscope, and from the microphone array. With the Azure Kinect Developer Kit connected to a development PC and working, you can start to write code for it. A command line recorder app can be used to capture data for playback in the viewer, storing depth information in an MKV (Matroska Video) format file.

Building your first depth-sensing application

Microsoft provides sample code for building a simple C application to work with the Azure Kinect Development Kit. There’s only one library needed, and this provides the objects and methods needed to work with the camera. Any application first needs to check how many cameras are connected to the host PC before you configure your device data streams. Devices are identified by their serial number, so you can use this to address a specific camera when working with several connected to the same PC or chained together.

The Azure Kinect Developer Kit only delivers streaming data, so applications need to configure the data rate in frames per second, along with image color formats and resolutions. Once you’ve created a configuration object you can open a connection using your configuration object, ready to stream data. When you’re finished reading a data stream, stop and close the device.

Images are captured in a capture object, with a depth image, an IR image, and a color image for each individual image, taken from the device’s stream. Once you have a capture, you can extract the individual images ready for use in your application. Image objects can be delivered to the Azure machine vision APIs, ready for object recognition or anomaly detection. One example Microsoft has used in its demonstrations is an application that uses captured video to detect when a worker on a factory floor gets too close to operating machinery; another detects someone smoking near a gas pump.

azure kinect 01 IDG

Images are captured from the device in a correlated way. Each captured image has a depth image, an IR image, a color image, or a combination of images.

A similar process gives you data from the position and motion sensors. As motion data is captured at a higher rate than image data, you must implement some form of synchronization in your code to avoid losing any data. Audio data is captured using standard Windows APIs, including those used by Azure’s speech services.

Although the Azure Kinect hardware captures a lot of data, the SDK functions help transform it into a usable form; for example, adding depth data to an RGB image to produce RGB-D images that are transformed to the viewpoint of the RGB camera (and vice versa). As the two sensors are off-set, this requires warping an image mesh to merge the two cameras’ viewpoints, using your PC’s GPU. Another transform generates a point cloud, allowing you to get depth data for each pixel in your capture. One useful option in the SDK is the ability to capture video and data streams in a Matroska-format file. This approach allows bandwidth-limited devices to batch data and deliver it to, say, Azure Stack Edge devices with Cognitive Services containers for batch processing.

Body tracking a digital skeleton

The original Kinect hardware introduced body tracking, with a skeletal model that could be used to quickly evaluate posture and gestures. That same approach continues in the Azure Kinect Body Tracking SDK, which uses Nvidia’s CUDA GPU parallel processing technology to work with 3-D image data from your device’s depth sensor. A bundled sample app shows some of the features of the SDK, including the ability to track more than one person at a time. 

azure kinect 02 lg IDG

The Azure Kinect Body Tracking Viewer shows a 3-D point cloud and tracked bodies.

The Body Tracking SDK builds on the Azure Kinect SDK, using it to configure and connect to a device. Captured image data is processed by the tracker, storing data in a body frame data structure. This contains a collection of skeletal structures for identified bodies, a 2-D index map to help visualize your data, along with the underlying 2-D and 3-D images that were used to construct the tracking data. Each frame can be used to construct animations or to feed information to machine learning tools that can help process tracked positions in relation to a room map or to ideal positions.

Azure’s Cognitive Services are a powerful tool for processing data, and the addition of Azure Kinect makes it possible to use them in a wide range of industrial and enterprise scenarios. With a focus on workplace 3-D image recognition, Microsoft is attempting to show how image recognition can be used to reduce risk and improve safety. There’s even the option of using an array of devices as a quick volumetric capture system, which can help build both mixed-reality environments and provide source data for CAD and other design tools. The result is a flexible device that, with a little code, becomes a very powerful sensing device.

simon_bisson
Contributor

Author of InfoWorld's Enterprise Microsoft blog, Simon BIsson prefers to think of "career" as a verb rather than a noun, having worked in academic and telecoms research, as well as having been the CTO of a startup, running the technical side of UK Online (the first national ISP with content as well as connections), before moving into consultancy and technology strategy. He’s built plenty of large-scale web applications, designed architectures for multi-terabyte online image stores, implemented B2B information hubs, and come up with next generation mobile network architectures and knowledge management solutions. In between doing all that, he’s been a freelance journalist since the early days of the web and writes about everything from enterprise architecture down to gadgets.

More from this author