This project demonstrates how to use Power Query Editor in Excel to clean and standardize an Audible dataset. The goal is to prepare the dataset for further analysis by ensuring data consistency and formatting. The following tasks are accomplished:
- Convert names in the "name" column to title casing to ensure consistent capitalization.
- Split combined first and last names in the "author" column into separate columns for first and last names.
- Ensure all entries in the "release date" column follow a consistent date format (DD-MM-YYYY).
- Change the time column from text format to a duration format that Excel recognizes.
- Ensure the "price" column is in a numeric format. Identify and address any non-numeric values.
- Convert text ratings in the "stars" column to numeric values for easier analysis.
- Separate the "narratedby" column into multiple columns if it contains multiple narrators.
- Combine the "releasedate" and "language" columns into a new column named "releaseinfo" with the format "DD-MM-YYYY, Language."
- Ensure all currency values in the "price" column are formatted consistently with two decimal places.
- Load your dataset into Power Query Editor in Excel.
- Select the "name" column, use the "Transform" tab, and apply the "Capitalize Each Word" transformation.
- Select the "author" column, use the "Split Column" feature to divide by delimiter (if space or comma), and rename columns as "First Name" and "Last Name."
- Select the "release date" column, use the "Change Type" feature, and set the data type to "Date" ensuring the format is set to DD-MM-YYYY.
- Select the "time" column, and use the "Transform" tab to convert text to a duration format by changing the type to "Duration."
- Select the "price" column, and ensure it is in a numeric format. Use the "Detect Data Type" feature to find and correct any non-numeric values.
- Create a new column with numeric ratings by mapping text values to numbers using the "Add Column" feature and "Custom Column" with an appropriate formula.
- Select the "narratedby" column, and use the "Split Column" feature based on delimiter (e.g., comma) to create multiple columns.
- Use the "Add Column" feature to create a new column "release info" by combining the "release date" and "language" columns with a custom formula.
- Select the "price" column, and use the "Transform" tab to set the format to a numeric type with two decimal places.
- Click "Close & Load" to apply changes and load the cleaned dataset back into Excel.
- Review the dataset to ensure all transformations have been applied correctly and the data is ready for analysis.
Feel free to fork this repository and make improvements. If you encounter any issues or have suggestions, please open an issue or pull request.