Scene Space Inference Based on Stereo Vision
           

- 指導教授 黃漢邦 博士 研究生 林昆翰

- Advisor :Dr.Han-Pang Huang Student :Kuen-Han Lin

Lab. of Robotics., Department of Mechanical Engineering National Taiwan University Taiwan

Abstract:

The sense of space with eyes gives humans the basic ability to interpret the world. However, it is a challenging task for machines to model space in computer vision. Unlike scene reconstruction attempts to reconstruct every point of an image to correct 3D position, the sense of space aims to specify the original empty area of the scene. With this idea we can also distinguish the object within or outside the space.


This thesis provides an intuitive way to inference the space of a scene using stereo cameras. We first segmented the ground out of the image by adaptively learning the ground model in the image. We then used the convex hull to approximate the scene space. Objects within the scene can also be detected with the stereo cameras. Finally, we organized the scene space and the objects within the scene into a graphical model, and then used particle filters to approximate the solution.


Experiments were conducted to test the accuracy of the ground segmentation and the precision and recall of object detection within the scene. The results showed promising ground segmentation accuracy in an indoor environment, and gave a visualization segmentation result in an occupancy grid map. The precision and recall of object detection was about 50percent in our system. With additional tracking of the object, the recall could improve approximately 5 percent. Last, we also showed the possibility to improve the human detection result; many wrong detections can be filtered by our system.


We show a novel way to interpret the space of scene using simple convex contours, and the possibility to detect the object within the scene without a classifier. The result can be considered as prior knowledge for further image tasks, e.g. obstacle avoidance or object recognition.





中文摘要:

透過視覺對空間範圍的感知是人們去詮釋這個世界的一項基本能力, 但是對於機器而言, 利用電腦視覺去推測空間範圍卻是件充滿挑戰的任務. 不同於一般環境重建時要求還原影像到上面每一個畫素到正確的空間位置; 空間範圍的推測希望是能定義環境裡原本應屬於空的範圍, 藉此我們可以分辨出在空間中的物體和空間外的物體.


本篇論文提供了一個直覺的方式利用立體攝影機去推測環境的範圍大小. 我們首先透過適應學習地板的模型去把影像中的地板分離出來. 接著我們利用萃取地板凸邊形的概念去近似環境的範圍, 同時立體攝影機也可以偵測出在此範圍內的物體. 最後我們把環境範圍和物體的偵測利用一個圖形模形去描述, 並利用粒子過濾器去取得近似解.


我們 針對地板分離的正確性和在環境範圍內的物體偵測的準確性和偵測率分別做了兩組實驗. 實驗結果顯示在室內環境裡能得到一個不錯的地板分離正確率, 同時我們也利用佔據率網格地圖提供了一個視覺化的分離結果. 而系統也提供了百分之五十左右的物體偵測的準確性和偵測率. 加上額外的追蹤物體功能, 偵測率能再提昇百分之五左右. 最後, 我們也利用行人偵測實驗來驗證本系統是否能提升物體偵測的正確率. 結果顯示許多錯誤的偵測結果都能被我們的系統給過濾掉, 進而提升物體辨識的效果.


我們提出了一個新的方式利用簡單的凸邊型外框去推測環境的範圍, 並且不使用特定物體分類器去偵測物體. 系統的結果可以被考慮做為進一步影像處理的預先知識; 例如障礙物躲避或是物體辨識。