Slack: #assemble
Project Description: Assemble is a data for democracy community working to build tools and infrastructure to enable the study of online communities and their characteristics. We have several active repositories, our goal is to build a toolkit which takes care of common tasks so researchers do not have to reinvent the wheel with each new project.
Maintainers: Maintainers have write access to the repository. They are responsible for reviewing pull requests, providing feedback and ensuring consistency.
- @sjackson (Subject Matter Knowledge)
- @wwymak (Community Detection, NLP)
- @henripal (Assemble, Community Detection, NLP)
- @metame (collect-Social)
- @asragab (collect-social, assemble, data engineering)
- @alarcj (collect-social, onboarding, tutorials, twitter-analysis)
- For a list of first steps, please visit our community guide.
- Read about how we use issue labels
- "First-timers" are welcome! Whether you're trying to learn data science, hone your coding skills, or get started collaborating over the web, we're happy to help. (Sidenote: with respect to Git and GitHub specifically, our github-playground repo and the #github-help Slack channel are good places to start.)
- We believe good code is reviewed code. All commits to this repository are approved by project maintainers and/or leads (listed above). The goal here is not to criticize or judge your abilities! Rather, sharing insights and achievements. Code reviews help us continually refine the project's scope and direction, as well as encourage the discussion we need for it to thrive.
- This README belongs to everyone. If we've missed some crucial information or left anything unclear, edit this document and submit a pull request. We welcome the feedback! Up-to-date documentation is critical to what we do, and changes like this are a great way to make your first contribution to the project.
Take a look at this list to get an idea of the tools and knowledge we're leveraging. If you're good with any of these, or if you'd like to get better at them, this might be a good project to get involved with!
If you would like to get started with any of these skills, check out the tutorials and chat about it in #learning.
- Python 3 (scripting, web scraping, analysis, Jupyter notebooks, visualization)
- Data extraction/ETL
- Data cleaning
If you like the idea of building tools that will help enable analysis across many domains these projects are a great place to start. If you have an idea for a dataset you would like to collect please file a proposal via GitHub issue with the label proposal
.
Leveraging the Infrastructure group's fantastic work, the Curation team makes available repositories of information about online communities. The data is "analysis ready" and has been curated to support downstream analytical objectives, and the team works closely with the data.world staff.
- Discursive: Framework for searching or streaming tweets storing them in Elasticsearch and S3.
- Collect-Social: This project aims to make that collection process as simple as possible, by making some common-sense assumptions about what most researchers need, and how they like to work with their data. For example, tasks like grabbing all the posts and comments from a handful of Facebook pages, and dumping the results into a sqlite database.
- Reddit-Api-Miner to be folded in with Collect-Social
We are looking for people to take our raw data and curate it so that it is analysis ready. You will work closely with the the person(s) who gathered the data to understand methodologies for how the data was gathered to help document the end to end data cleaning process for future analysts. Eventador has gracioulsy donated infrastructure to assist with this effort.
Additional Resource:
- Getting started with Eventador
- See DATA GOVERNANCE.
Raw data:
- Info Source
- Congressional Record
- Discursive
- Have data to add? Check our how-to guide
- Oathkeepers - Militia and white nationalist twitter data
We need people who would like to write tutorials or script examples on how to do common tasks.
- Do not worry if you are not an expert. Tutorials from the perspective of a beginner are great for other beginners.
Special thanks to the drug-spending team for writing such a great README that we had to borrow liberally from it