This is a Python project that scrapes and downloads episodes from the Darknet Diaries podcast. It includes options to download all episodes, the latest missing episodes, a range of episodes, or only the missing episodes.
- Python 3.x
- Selenium
- Beautiful Soup 4
- ChromeDriver
- Clone this repository to your local machine using 'git clone https://github.com/sdbumann/DarknetDiariesPodcastScraper'.
- Navigate into the directory: 'cd DarknetDiariesPodcastScraper'.
- Install the required packages by running 'pip install -r requirements.txt'
- Navigate to the directory where you cloned this repository in your terminal.
- Run the script: 'python scraper.py [ARGUMENT]'.
- Listen and enjoy podcast.
The script takes an optional argument:
- 'latest': Downloads only the latest missing episode.
- 'all': Downloads all episodes.
- 'range min_episode max_episode': Downloads episodes within the specified range.
- 'missing': Downloads only the missing episodes.
- 'help': Displays usage instructions.
If no argument is specified, the script displays the usage instructions.
The script will begin downloading the MP3 files to a folder called 'downloads' in the current working directory. If the folder does not exist, the script will create one. Note that it is also possible to change the folder and path.
This script uses the Selenium and Beautiful Soup packages to scrape the Darknet Diaries website. These packages will be installed automatically when you run 'pip install -r requirements.txt'.
The following functions are defined in the script:
'scraper(base_url, downloads_folder, episode_numbers)' Scrapes the episodes with the specified episode numbers from the specified base URL and downloads them to the specified downloads folder.
- 'base_url (str)': The base URL of the Darknet Diaries podcast.
- 'downloads_folder (str)': The path of the downloads folder.
- 'episode_numbers (list)': A list of episode numbers to scrape.
'get_latest_episode(base_url)' Gets the latest episode number from the specified base URL.
- 'base_url (str)': The base URL of the Darknet Diaries podcast.
'get_episode_numbers(downloads_folder)' Gets a list of episode numbers that have already been downloaded from the specified downloads folder.
- 'downloads_folder (str)': The path of the downloads folder.
'get_next_episode_number(downloads_folder)' Gets the next episode number to download from the specified downloads folder.
- 'downloads_folder (str)'': The path of the downloads folder.
'get_list_of_all_latest_missing_episodes(downloads_folder, max_eps_num)' Gets a list of the latest missing episodes from the specified downloads folder and the maximum episode number.
- 'downloads_folder (str)': The path of the downloads folder.
- 'max_eps_num (int)': The maximum episode number.
'get_list_of_all_missing_episodes(downloads_folder, all_episodes)' Gets a list of all missing episodes from the specified downloads folder and a list of all available episodes.
- 'downloads_folder (str)': The path of the downloads folder.
- 'all_episodes (list)': A list of all available episodes.
If you find any bugs or issues with this script, please feel free to open an issue or submit a pull request.
Software licensed under the GNU GPLv3.
Project Link: https://github.com/sdbumann/DarknetDiariesPodcastScraper