This code performs data analysis and classification tasks on a dataset of books. It includes the following main steps:
- Reads data from a CSV file (
books_1.Best_Books_Ever.csv
). - Extracts and converts the
bookId
column to integers. - Drops rows with missing values in specific columns.
- Extracts and preprocesses data for analysis, including handling genres and authors.
- Generates various insights and statistics about the dataset.
- Creates histograms and displays information about book ratings and page counts.
- Utilizes the TF-IDF vectorization technique for bigrams and unigrams.
- Calculates cosine similarities between book descriptions.
- Provides book recommendations based on cosine similarity.
- Preprocesses textual data using Word2Vec.
- Performs classification using Random Forest, Naive Bayes, and Support Vector Classifier.
- Evaluates model performance with precision, recall, F1-score, and accuracy metrics.