BorderlessData

BorderlessData

Geek Repo

0

followers

0

following

Github PK Tool:Github PK Tool

BorderlessData's repositories

add_corporate_information_daily_of_china

**大陆 31 个省份最近几日新增工商企业注册信息以及其他部分企业数据,大概100余万信息,包含企业名称、注册地址、统一社会信用代码、省份、城市、注册日期、经营范围、负责人、邮箱、注册资金、企业类型等资料。 In 31 provinces in mainland China, About 1000000 messages,new business registration information has been added in recent days, including company name, registered address, unified social credit code, province, city, registration date, business scope, responsible person, mailbox, registered capital, and type of business.

Stargazers:0Issues:0Issues:0

aistudio-doc2vec-for-investigative-journalism

How Quartz used AI to help reporters search the Mauritius Leaks

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

aistudio-dochate-public

Learning text classification for journalists through DocHate tips

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0
Language:RubyLicense:MITStargazers:0Issues:0Issues:0

aistudio-searching-data-dumps-with-use

searching large heterogenous data dumps with Universal Sentence Encoder

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

aistudio-workshops

Workshops created by the Quartz AI Studio

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

awesome-iptv

A curated list of resources related to IPTV

Stargazers:0Issues:0Issues:0

bad-data-guide

An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

Stargazers:0Issues:0Issues:0

Crawling-Infrastructure

Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.

Language:TypeScriptLicense:AGPL-3.0Stargazers:0Issues:0Issues:0

datadonkey

DataDonkey handles XML, CSV and Excel files

Stargazers:0Issues:0Issues:0

german-gov-domains

An incomplete listing of german government domains

License:CC0-1.0Stargazers:0Issues:0Issues:0

GlobaLeaks

GlobaLeaks - The Open-Source Whistleblowing Software

License:NOASSERTIONStargazers:0Issues:0Issues:0

government.github.com

Gather, curate, and feature stories of public servants and civic hackers using GitHub as part of their open government innovations

Stargazers:0Issues:0Issues:0

govt-urls

Most government websites end in .gov or .mil, but many do not. This repo contains USA.gov's list of public government domains and URLs that don't end in .gov or .mil.

Stargazers:0Issues:0Issues:0

hstspreload.com

An API to determine if a domain is included in HSTS preload lists.

License:MITStargazers:0Issues:0Issues:0

infosechiring.com

Open jobs and job seekers in the information security field.

License:MITStargazers:0Issues:0Issues:0

iptv

Collection of 8000+ publicly available IPTV channels from all over the world

License:UnlicenseStargazers:0Issues:0Issues:0

nomenklatura

Data de-deuplication tool

License:MITStargazers:0Issues:0Issues:0

pol-ad-dashboard

Political Ad Dashboard

License:MITStargazers:0Issues:0Issues:0

proxy_pool

Python爬虫代理IP池(proxy pool)

License:MITStargazers:0Issues:0Issues:0

qccspider

企查查企业信息爬虫 ,企查查app每日新增企业抓取,可以进行每日的增量抓取、企业数据、工商数据等等。

Stargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

Save-to-the-Wayback-Machine

Browser extension for quickly saving web pages to the Internet Archive's Wayback Machine.

License:GPL-3.0Stargazers:0Issues:0Issues:0

scrapy-wayback-machine

A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

License:ISCStargazers:0Issues:0Issues:0

terraform-aws-dynamic-subnets

Terraform module for public and private subnets provisioning in existing VPC

License:Apache-2.0Stargazers:0Issues:0Issues:0

wayback-machine-chrome

A web browser extension for Chrome, Firefox, Edge, and Safari 14.

License:AGPL-3.0Stargazers:0Issues:0Issues:0

wayback-machine-downloader

Download an entire website from the Wayback Machine.

License:NOASSERTIONStargazers:0Issues:0Issues:0

wayback-machine-scraper

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

License:ISCStargazers:0Issues:0Issues:0

waybackurls

Fetch all the URLs that the Wayback Machine knows about for a domain

Stargazers:0Issues:0Issues:0