The dataset contains academic papers from five different domains collected from the Web of Science, namely business, artifical intelligence, sociology, transport and law. One line is a document which contains the title and abstract fields of one paper.