We take frames from Kinect as input and employ simple geometry based algorithm to generate layered representation of a scene by building "properties" (intra-object) and "relationships" (inter-object) for subsequent commonsense reasoning and decision making.