veluga29 / sea_forecast_scheduler

Crawl beach forecast data in Korea and save the data in DB every hour

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sea Forecast Service - Scheduled Crawler

๐Ÿ“„ Personal project / ๐Ÿ“… 2021.10 - 2021.11

์›ํ•˜๋Š” ํ•ด๋ณ€์˜ ๊ธฐ์ƒ ์ •๋ณด ์กฐํšŒ๋ฅผ ์ œ๊ณตํ•˜๋Š” Sea Forecast Service์˜ Scheduled Crawler part์ž…๋‹ˆ๋‹ค.

โ€‹

๐Ÿ”– Tech stack

  • Python 3.9
  • Scrapy 2.5
  • Celery 5.2
  • RabbitMQ 5.0
  • Heroku

โ€‹

๐Ÿ”– Core features

๐Ÿ“Ž Scrapy

  • ForecastSpider ๐Ÿ“Œ ์ฝ”๋“œ ํ™•์ธ

    • Scrapy๋กœ ๊ธฐ์ƒ์ฒญ์˜ ์„œํ•‘ ํ•ด์ˆ˜์š•์žฅ ๊ธฐ์ƒ ์ •๋ณด๋ฅผ ํฌ๋กค๋งํ•˜๋Š” spider ํด๋ž˜์Šค๋ฅผ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.
    • https://marine.kma.go.kr/custom/leisure.pop?work=beach&id=์—์„œ ํ•ด๋ณ€์˜ id๋งŒ ๋ณ€๊ฒฝํ•˜๋ฉด์„œ ํฌ๋กค๋งํ•ฉ๋‹ˆ๋‹ค.
      • 24๋ฒˆ์„ ์ œ์™ธํ•œ 1~26๋ฒˆ ํ•ด๋ณ€์˜ ์ •๋ณด๋ฅผ ํฌ๋กค๋งํ•ฉ๋‹ˆ๋‹ค. (24๋ฒˆ ํŽ˜์ด์ง€๋Š” ๋นˆ ์ •๋ณด๊ฐ€ ์žˆ์–ด ์ƒ๋žตํ–ˆ์Šต๋‹ˆ๋‹ค.)
      • ๊ฐ ํ•ด๋ณ€์˜ ์ค‘์š” ๊ธฐ์ƒ ์ •๋ณด๋ฅผ ํŒŒ์‹ฑํ•ด ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๋„˜๊น๋‹ˆ๋‹ค.
  • PostgreSQLPipeline ๐Ÿ“Œ ์ฝ”๋“œ ํ™•์ธ

    • ํฌ๋กค๋งํ•œ ์ •๋ณด๋ฅผ Psycopg2 ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ PostgreSQL DB์— ์ €์žฅํ•˜๋Š” pipeline ํด๋ž˜์Šค๋ฅผ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.

    • BeachForecastList ํ…Œ์ด๋ธ”์—๋Š” ์ตœ์‹  ๊ธฐ์ƒ ์ •๋ณด๋งŒ ๋‹ด๊ธฐ๋„๋ก ์ฟผ๋ฆฌ๋ฅผ ์งฐ์Šต๋‹ˆ๋‹ค.

    • BeachForecastListHistory ํ…Œ์ด๋ธ”์—๋Š” ์ง€์†์ ์œผ๋กœ ๊ธฐ์ƒ ์ •๋ณด๊ฐ€ ์Œ“์ด๋„๋ก ์ฟผ๋ฆฌ๋ฅผ ์งฐ์Šต๋‹ˆ๋‹ค.

      ๐Ÿ’ก Reference: Store Scrapy crawled data in PostgresSQL

โ€‹

๐Ÿ“Ž Celery

  • tasks.py ๐Ÿ“Œ ์ฝ”๋“œ ํ™•์ธ

    • ๋ธŒ๋กœ์ปค๋กœ RabbitMQ๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

    • Scrapy์˜ ์ €์ˆ˜์ค€ API Crawler๋ฅผ ์‚ฌ์šฉํ•ด ์Šคํฌ๋ฆฝํŠธ์—์„œ scrapy๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค.

      ๐Ÿ’ก Reference: Running Scrapy In Celery Tasks

    • Celery ์„œ๋ฒ„๋ฅผ ์‹คํ–‰ํ•ด๋‘๋ฉด Celery beat์ด 1์‹œ๊ฐ„์— ํ•œ ๋ฒˆ์”ฉ run_scraper_task ํฌ๋กค๋ง ์ž‘์—…์„ ๋ฉ”์‹œ์ง€ ํ์— ๋˜์ ธ ์‹คํ–‰ํ•˜๋„๋ก ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.

โ€‹

๐Ÿ”– Trouble shooting

  • ํŽ˜์ด์ง€์˜ ํŠน์ • ๋ถ€๋ถ„์— ๋Œ€ํ•˜์—ฌ ํฌ๋กค๋ง์ด ์ž˜ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š๋Š” ๋ฌธ์ œ ๐Ÿ“Œ ์ฝ”๋“œ ํ™•์ธ
    • ํ’ํ–ฅ์ด ๋ถ„๋ช… string์ธ๋ฐ number๋กœ ํฌ๋กค๋ง๋˜๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ๊ธฐ์˜จ, ํ’์†, ์Šต๋„, ํ•ด๋ฉด๊ธฐ์••์ด ์†Œ์ˆ˜์  ํ˜•ํƒœ์˜ number์ธ๋ฐ, ์†Œ์ˆ˜์  ์—†์ด ํฌ๋กค๋ง๋˜๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ํด๋ผ์ด์–ธํŠธ ์ธก์—์„œ ์–ด๋–ค ์ฒ˜๋ฆฌ๊ฐ€ ์žˆ๋Š”์ง€ ์˜์‹ฌํ–ˆ๊ณ , ๋„คํŠธ์›Œํฌ tab์„ ํ†ตํ•ด ํŠน์ • ํ•จ์ˆ˜๊ฐ€ ๋‹ด๊ธด jsํŒŒ์ผ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ํ•ด๋‹น js ํŒŒ์ผ์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ Python์œผ๋กœ ์˜ฎ๊ฒจ util๋กœ์„œ ํ™œ์šฉํ•ด ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.
  • Windows ํ™˜๊ฒฝ์—์„œ Celery ์„œ๋ฒ„๊ฐ€ ์ œ๋Œ€๋กœ ๊ธฐ๋Šฅํ•˜์ง€ ์•Š๋Š” ๋ฌธ์ œ
    • celery -A tasks worker -l INFO๋กœ ์„œ๋ฒ„๋ฅผ ์‹คํ–‰ํ•˜๋ฉด, task๋ฅผ ๋ฉ”์‹œ์ง€ ํ์— ๋˜์ ธ๋„ ์ž‘์—…์ด ์™„๋ฃŒ๋˜์ง€ ์•Š๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.
    • Windows์˜ ๊ฒฝ์šฐ, Celery 4.x๋ถ€ํ„ฐ ๊ณต์‹์ ์œผ๋กœ ์ง€์›ํ•˜์ง€ ์•Š์•„ ์˜ค๋ฅ˜๊ฐ€ ์žฆ์•„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. (Celery docs)
    • celery -A tasks worker -l INFO -P threads๋กœ ์„œ๋ฒ„๋ฅผ ์‹คํ–‰ํ•˜๋ฉด, ๋ฉ”์‹œ์ง€ ํ์— ๋˜์ง„ ์ž‘์—…์ด ์™„๋ฃŒ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ๋กœ์ปฌ(Windows) ํ™˜๊ฒฝ์—์„œ๋Š” -P threads ์˜ต์…˜์„ ๋ถ™์ด๊ณ , Heroku์—์„œ๋Š” ํ•ด๋‹น ์˜ต์…˜ ์—†์ด ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

About

Crawl beach forecast data in Korea and save the data in DB every hour


Languages

Language:Python 100.0%