ISIA RGB-D Video Database

Xinhang Song     Shuqiang Jiang

Institute of Computing Technology, CAS

ISIA RGB-D video database contains indoor videos captured from three different cities (separated up to 1000 km) in China, guaranteeing diversity in locations and scenes. The database consists of 58 indoor scene categories, and has a total of 278 videos, with more than five hours of footage in total. The duration of the footage per category is shown in Fig. 1. The duration of videos varies, depending on the complexity and extension of the scene itself (a 'classroom' or 'furniture store' requires more footage than 'office' or 'bedroom') and how common and easy to access are certain categories (e.g., 'office' and 'classroom' have more videos than 'auditorium' or 'bowling alley'). Videos are captured using a Microsoft Kinect version 2 sensor, with a frame rate of 15 frames/s, obtaining more than 275000 frames.

Distribution of scene categories in ISIA RGB-D

The database aims at addressing the limitations of the narrow field of view in conventional RGB-D sensors and the limited range of the depth one, by increasing the coverage by recording videos instead of images. In particular, it targets wide scenes, which we capture by starting on one side and moving to the other across the scene while panning the camera to maximize the coverage. Fig. 2 shows an example of the category classroom). Note that regions like the podium, the whiteboard and the windows are missing in the initial depth image, but are captured in other parts of the video sequence.

Capturing process of a classroom scene. Note that this wide and extend scene requires more footage than other cases.


The link of the raw videos of ISIA RGB-D is here . The file size is 17.6 GB.


If you have any questions, corrections or other issues, please contact Shuqiang Jiang (sqjiang[at]