Web scraper: You can use Python and Beautifulsoup to scrape data from websites that provide information. You can also use Celery to schedule periodic scraping tasks.
Database: You can store the scraped data in a PostgreSQL database.
Data cleaning and adjustment: You can use Python and pandas to clean and adjust the scraped data.
Map integration: You can integrate an open-source map such as Leaflet into your web application.
Dashboard: You can use plotly to create interactive visualizations of the scraped data.
Deployment: You can deploy your web application on AWS using Flask and Redis.
Topics within "web scraping" epic
static vs dynamic sites
changing page structure
authentication, hidden sites/pages
urllib.request
regex
re.findall()
re.search()
re.sub()
. matches any character (except for line terminators)
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
time python fake_jobs_example.py
# real - time is the actual time elapsed during the execution of the script.# user - time is the amount of CPU time spent in user-mode code (outside the kernel) within the process.# sys - time is the amount of CPU time spent in kernel-mode code (inside the kernel) within the process
real 0m8.756s
user 0m0.431s
sys 0m0.036s
python3 -m timeit '"-".join(str(n) for n in range(100))'