BorderlessData

0

followers

0

following

stars

BorderlessData's repositories

add_corporate_information_daily_of_china

**大陆 31 个省份最近几日新增工商企业注册信息以及其他部分企业数据，大概100余万信息，包含企业名称、注册地址、统一社会信用代码、省份、城市、注册日期、经营范围、负责人、邮箱、注册资金、企业类型等资料。 In 31 provinces in mainland China, About 1000000 messages，new business registration information has been added in recent days, including company name, registered address, unified social credit code, province, city, registration date, business scope, responsible person, mailbox, registered capital, and type of business.

000

aistudio-doc2vec-for-investigative-journalism

How Quartz used AI to help reporters search the Mauritius Leaks

Language:Jupyter Notebook000

aistudio-dochate-public

Learning text classification for journalists through DocHate tips

Language:Jupyter NotebookMIT000

aistudio-fbdb

Language:RubyMIT000

aistudio-searching-data-dumps-with-use

searching large heterogenous data dumps with Universal Sentence Encoder

Language:Jupyter NotebookMIT000

aistudio-workshops

Workshops created by the Quartz AI Studio

Language:Jupyter NotebookMIT000

awesome-iptv

A curated list of resources related to IPTV

000

bad-data-guide

An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

000

Crawling-Infrastructure

Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.

Language:TypeScriptAGPL-3.0000

datadonkey

DataDonkey handles XML, CSV and Excel files

000

german-gov-domains

An incomplete listing of german government domains

CC0-1.0000

GlobaLeaks

GlobaLeaks - The Open-Source Whistleblowing Software

NOASSERTION000

government.github.com

Gather, curate, and feature stories of public servants and civic hackers using GitHub as part of their open government innovations

000

govt-urls

Most government websites end in .gov or .mil, but many do not. This repo contains USA.gov's list of public government domains and URLs that don't end in .gov or .mil.

000

hstspreload.com

An API to determine if a domain is included in HSTS preload lists.

MIT000

infosechiring.com

Open jobs and job seekers in the information security field.

MIT000

iptv

Collection of 8000+ publicly available IPTV channels from all over the world

Unlicense000

nomenklatura

Data de-deuplication tool

MIT000

pol-ad-dashboard

Political Ad Dashboard

MIT000

proxy_pool

Python爬虫代理IP池(proxy pool)

MIT000

qccspider

企查查企业信息爬虫，企查查app每日新增企业抓取,可以进行每日的增量抓取、企业数据、工商数据等等。

000

quackbot

MIT000

salesforce-ssrf

000

Save-to-the-Wayback-Machine

Browser extension for quickly saving web pages to the Internet Archive's Wayback Machine.

GPL-3.0000

scrapy-wayback-machine

A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

ISC000

terraform-aws-dynamic-subnets

Terraform module for public and private subnets provisioning in existing VPC

Apache-2.0000

wayback-machine-chrome

A web browser extension for Chrome, Firefox, Edge, and Safari 14.

AGPL-3.0000

wayback-machine-downloader

Download an entire website from the Wayback Machine.

NOASSERTION000

wayback-machine-scraper

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

ISC000

waybackurls

Fetch all the URLs that the Wayback Machine knows about for a domain

000