The model should be able to, given a soccer match, create an arousal function that describes the most excitement moments of the match. The function, combination of bells of different kurtosis and skewness, are a representation of the importance a video shot has inside the match. Finally, in order to obtain the summarization video, the function should be filtered with adequate thresholds, as shown in the following figure.
273 goals from complete soccer matches have extracted
from several matches of 'La liga' competition. To gather these data, a video labeler is
necessary to label and cut the necessary video highlights by hand in order to feed
a neural network able to understand its relevant features.
Here (https://github.com/gioele8/video-labeler) you can find this video labeler to create your dataset.
The model with best results I obtained is the following: