Abstract— Violence detection has been investigated extensively in the literature. Recently, IOT based violence video surveillance is an intelligent component integrated in security system of smart buildings. Violence video detector is a specific kind of detection models that should be highly accurate to increase the model’s sensitivity and reduce the false alarm rate. This paper proposes a novel architecture of ConvLSTM model that can run on low-cost Internet of Things (IOT) device such as raspberry pi board. The paper utilized convolutional neural networks (CNNs) to learn spatial features from video’s frames that were applied to Long Short- Term Memory (LSTM) for video classification into violence/non-violence classes. A complex dataset including two public datasets: RWF-2000 and RLVS-2000 was used for model training and evaluation. The challenging video content includes crowds and chaos, small object at far distance, low resolution, and transient action. Additionally, the videos were captured in various environments such as street, prison, and schools with several human actions such as playing football, basketball, tennis, swimming and eating. The experimental results show high performance of the proposed violence detection model in terms of average metrics having an accuracy of 73.35 %, recall of 76.90 %, precision of 72.53 %, F1 score of 74.01 %, false negative rate of 23.10 %, false positive rate of 30.20 %, and AUC of 82.0 %.