marcellobenigno / SICAR

This tool is designed for students, researchers, data scientists or anyone who would like to have access to SICAR files

Home Page:https://urbanogilson.github.io/posts/sicar/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SICAR

This tool is designed for students, researchers, data scientists, or anyone who would like to have access to SICAR files.

Badges

Open In Collab made-with-python Code style: black Docker Pulls Coverage Status interrogate

Features

  • Get cities-codes by state code
  • Download Shapefile or CSV
  • Download city by code
  • Download lists of cities by code
  • Download all cities in a state by code
  • Download the entire country
  • Tesseract, and PaddleOCR (Optional) drivers to automatically detect captcha

Installation

Install SICAR with pip

pip install git+https://github.com/urbanogilson/SICAR

Prerequisite:

Google Tesseract OCR (additional info on how to install the engine on Linux, Mac OSX, and Windows).

Optional: PaddleOCR (additional info on how to install the engine on Linux, Mac OSX, and Windows).

If you don't want to install dependencies on your computer or don't know how to install them, we strongly recommend Google Colab.

Documentation

Usage/Examples

from SICAR import Sicar
import pprint

# Create Sicar instance
car = Sicar(email = "name@domain.com")

# Get cities codes in Roraima state
cities_codes = car.get_cities_codes(state='RR')

pprint.pprint(cities_codes)
# {'Alto Alegre': '1400050',
#  'Amajari': '1400027',
#  'Boa Vista': '1400100',
#  'Bonfim': '1400159',
#  'Cantá': '1400175',
#  'Caracaraí': '1400209',
#  'Caroebe': '1400233',
#  'Iracema': '1400282',
#  'Mucajaí': '1400308',
#  'Normandia': '1400407',
#  'Pacaraima': '1400456',
#  'Rorainópolis': '1400472',
#  'São João da Baliza': '1400506',
#  'São Luiz': '1400605',
#  'Uiramutã': '1400704'}

# Download 'Alto Alegre': '1400050'
car.download_city_code('1400050', folder='Roraima')

# Download in CSV format
from SICAR import OutputFormat
car.download_city_code('1400050', output_format = OutputFormat.CSV, folder='Roraima')

# Download specific cities
cities_codes = {
    'São Gabriel da Cachoeira': '1303809',
    'São Paulo de Olivença': '1303908'
}

car.download_cities(cities_codes=cities_codes, folder='cities')

# Download all cities in Roraima state
car.download_state(state='RR', folder='RR')

OCR drivers

Optical character recognition (OCR) drivers are used to recognize characters in a captcha.

We currently have two options for automating the download process.

Tesseract OCR (Default)

from SICAR import Sicar
from SICAR.drivers import Tesseract

# Create Sicar instance using Tesseract OCR
car = Sicar(email="name@domain.com", driver=Tesseract)

# Download a city
car.download_cities(cities_codes={'Belo Horizonte': '3106200'}, folder='SICAR/cities')

PaddleOCR

Install SICAR with pip and include Paddle dependencies

pip install 'SICAR[paddle] @  git+https://github.com/urbanogilson/SICAR'
from SICAR import Sicar
from SICAR.drivers import Paddle

# Create Sicar instance using PaddleOCR
car = Sicar(email="name@domain.com", driver=Paddle)

# Download a city
car.download_cities(cities_codes={'Balneário Camboriú': '4202008'}, folder='SICAR/cities')

Run with Google Colab

Using Google Colab, you don't need to install the dependencies on your computer and you can save files directly to your Google Drive.

Open In Collab

Run with Docker

Pull Image from Docker Hub urbanogilson/sicar

docker pull urbanogilson/sicar:latest

Run the downloaded Docker Image using an entry point (file) from your machine (host)

docker run -i -v $(pwd):/sicar urbanogilson/sicar:latest -<./examples/docker.py

Note: Update the entry point file ./examples/docker.py or create a new one to download data based on your needs.

or pass a script through STDIN

docker run -i -v $(pwd):/sicar urbanogilson/sicar:latest -<<EOF
from SICAR import Sicar
from SICAR.drivers import Paddle

car = Sicar(email="name@domain.com", driver=Paddle)

car.download_state(state='MG', folder='MG')
EOF

Note: Using $(pwd) the container will save the download data into the current folder.

Optional: Make an external directory to store the downloaded data and use a volume parameter in the run command to point to it.

Acknowledgements

Roadmap

  • Download city by name
  • Make Paddle driver optional
  • Add support to download CSV files

Contributing

The development environment with all necessary packages is available using Visual Studio Code Dev Containers.

Open in Remote - Containers

Contributions are always welcome!

Feedback

If you have any feedback, please reach me at hello@gilsonurbano.com

License

MIT

About

This tool is designed for students, researchers, data scientists or anyone who would like to have access to SICAR files

https://urbanogilson.github.io/posts/sicar/

License:MIT License


Languages

Language:Python 99.1%Language:Dockerfile 0.9%