Open Data Sources
- Availability and access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.
- Reuse and redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine-readable.
- Universal participation: everyone must be able to use, reuse and redistribute — there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.
-- Definition by the Open Knowledge Foundation
Open Data
- List of Public Datasets - user-curated
- DBpedia - utilizing a large multi-domain ontology
- Public Data Sets on AWS - common web crawl corpus, NASA satellite imagery, Human Genome, Google Book NGrams, Wikipedia Traffic, Million Song Dataset, Federal Reserve Economic Data, PubChem, more.
Governmental Data
Compendium of Governmental Open Data Sources
- Data.gov (USA)
- Africa Open Data
- US Census - Population Estimates and Projections, Nonemployer Statistics and County Business Patterns, Economic Indicators Time Series, more.
Non-Governmental Org Data
- The World Bank - business regulation measures, company-level data in emerging markets, household consumption patterns, World Development Indicators, World Bank finances
- ^Pew Research Center's Internet Project
Academic Data
Inter-university Consortium for Political and Social Research Data Portal
- Surveys of Economic Attitudes and Behavior
- Continuing Series of Consumer Surveys
- Historical and Contemporary Economic Processes and Indicators
Truly Random Data
Open Data Resources
- reddit r/datasets
- Open Data - Stack Exchange (discussion)
^ license is not truly open, involves some limitations