Crowd_Frame

⭐ Star us on GitHub — it motivates us a lot!

A software system that allows to easily design and deploy diverse types of crowdsourcing tasks.

Prerequisites
Getting Started
Environment Variables
Task Configuration
HITs Format
Task Performing
Results Download
Local Development
Troubleshooting
References

Prerequisites

Getting Started

Create an Amazon AWS Account
Create a new IAM USER your_iam_user

Attach the AdministratorAccess policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "*",
            "Resource": "*"
        }
    ]
}

Generate a new access key pair
Store the Access Key in your credentials file

Path: C:\Users\your_os_user\.aws\credentials
```
[your_iam_user]
aws_access_key_id=your_key
aws_secret_access_key=your_secret
```
Clone the repo Miccighel/Crowd_Frame
Enable the Yarn global binary
```
corepack enable
```
Move to repo folder:
```
cd ~/path/to/project
```
Move to data folder:
```
cd data
```
Create environment file .env:

Path: your_repo_folder/data/.env

Provide the mandatory subset of environment variables:

mail_contact=your_email_address
budget_limit=your_usd_budget_limit
task_name=your_task_name
batch_name=your_batch_name
admin_user=your_admin_username
admin_password=your_admin_password
server_config=none
aws_region=your_aws_region
aws_private_bucket=your_private_bucket_name
aws_deploy_bucket=your_deploy_bucket_name

Install python packages with pip install -r your_repo_folder/requirements.txt:

boto3==1.21.32  
ipapi==1.0.4  
ipinfo==4.2.1  
mako==1.1.4  
docker==5.0.3  
python-dotenv==0.20.0  
rich==10.16.2  
tqdm==4.64.0  
numpy==1.23.0  
pandas==1.4.2  
toloka-kit==0.1.25  
python-on-whales==0.43.0  
beautifulsoup4==4.11.1  
aiohttp==3.8.1

Run python script init.py

Path: your_repo_folder/data/init.py

The script will: - read your env. variables; - setup the AWS infrastructure; - generate an empty task configuration; - deploy the task on the public bucket.
Open your task:

https://your_deploy_bucket.s3.your_aws_region.amazonaws.com/your_task_name/your_batch_name/index.html

Crowd_Frame interacts with diverse Amazon Web Services (AWS) to deploy crowdsourcing tasks, store the data produced and so on. Each service used falls within the AWS Free Tier program. The budget limit that will block the usage of such services if/when surpassed.

Environment Variables

The following table describes each environment variables that can be set in your_repo_folder/data/.env

Variable	Description	Mandatory	Value
`profile_name`	Name of the IAM profile created during Step #2. If unspecified, the variable will use the value: `default`.	❌	`your_iam_user`
`mail_contact`	Contact mail to receive AWS budgeting related comunications	✔️	Valid email address
`platform`	Platform on which deploy the crowdsourcing task. Set it to `none` if you recruit the workers manually.	✔️	`none` or `mturk` or `prolific` or `toloka`
`budget_limit`	Maximum monthly money amount allowed to operate in USD; e.g., `5.0`	✔️	Positive float number
`task_name`	Identifier of the crowdsourcing task	✔️	Any string
`batch_name`	Identifier of a single task's batch	✔️	Any string
`task_title`	Custom title for the crowdsourcing task	❌	Any string
`batch_prefix`	Prefix of the identifiers of one or more task's batches. Use this variable to filter the final result set.	❌	Any string
`admin_user`	Username of the admin user	✔️	Any string
`admin_password`	Password of the admin user	✔️	Any string
`aws_region`	Region of your AWS account; e.g., `us-east-1`	✔️	Valid AWS region identifier
`aws_private_bucket`	Name of the private S3 bucket in which to store task configuration and data	✔️	String unique across AWS
`aws_deploy_bucket`	Name of the public S3 bucket in which to deploy task source code	✔️	String unique across AWS
`server_config`	Used to specify where the worker behavior logging interface is. Set it to `aws` to deploy the AWS-based infrastructure. Set it to `custom` if you want to provide a custom logging endpoint. Set it to `none` if you will not log worker behavior.	✔️	`aws` or `custom` or `none`
`enable_solver`	Allows to deploy the HITs solver locally. Allows to provide a set of documents which will be automatically allocated into a set of HITs. Requires the usage of Docker.	❌	`true` or `false`
`enable_crawling`	Enables the crawling of the results retrieved by the search engine.	❌	`true` or `false`
`prolific_completion_code`	Prolific study completion code. Provide here the code if you recruit crowd workers via Prolific. Required if the platform chosen is `prolific`.	❌	Valid Prolific completion code
`toloka_oauth_token`	Token to access Toloka's API. Required if the platform chosen is `toloka`.	❌	Valid Toloka OAuth token
`ip_info_token`	API Key to use `ipinfo.com` tracking functionalities.	❌	Valid IP Info key
`ip_geolocation_api_key`	API Key to use `ipgeolocation.io` tracking functionalities.	❌	Valid IP Geolocation key
`ipapi_api_key`	API Key to use `ipapi.com` tracking functionalities.	❌	Valid IP Api key
`user_stack_token`	API Key to use `userstack.com` user agent detection functionalities.	❌	Valid Userstack key
`bing_api_key`	API Key to use `BingWebSearch` search provider.	❌	Valid Bing API Web Search Key
`fake_json_token`	API Key to use `FakerWebSearch` search provider. Returns dummy responses useful to test the search engine.	❌	Valid fakeJSON.com API Key

Task Configuration

To configure your crowdsourcing task deployed:

open the administrator panel by appending ?admin=true;
click the Generate button to open the login prompt;
use your admin credentials;
proceed through each generation step.

When the configuration is ready, click the Upload button.

Step Overview

Step 1 - Questionnaires

Allows creating one or more questionnaires that workers will fill before or after task execution.

Step 2 - Evaluation Dimensions

Allows configuring what the worker will assess for each element of the HIT assigned.

Step 3 - Task Instructions

Each worker is shown with general task instructions before the task.

Step 4 - Evaluation Instructions

Each worker is shown with such instructions within the task's body.

Step 5 - Search Engine

Allows choosing the search provider wanted and to add a list of domains to filter from search results.

Step 6 - Task Settings

Allows to configures several task settings, such as the maximum amount of tries for each worker, the usage of an annotation interface, and much more.

It also allows to provide the file containing the set of HITs for the task deployed.

Step 7 - Worker Checks

Allows to configure additional checks on workers.

Task Testing

To test the task configured open the task and try it

https://your_deploy_bucket.s3.your_aws_region.amazonaws.com/your_task_name/your_batch_name/index.html

HITs Format

The HITs for a crowdsourcing task must be stored in a special json file and must comply to a specific format:

There must be an array of HITs (also called units);
Each HIT must have a unique input token attribute;
Each HIT must have a unique output token attribute;
The number of documents for each HIT must be specified;
The documents for each HIT are key/value dictionaries;
Each document can have an arbitrary number of attributes.

The following fragment shows a valid configuration of a crowdsourcing task with 1 HIT.

[
  {
    "unit_id": "unit_0",
    "token_input": "ABCDEFGHILM",
    "token_output": "MNOPQRSTUVZ",
    "documents_number": 1,
    "documents": [
      {
        "id": "identifier_1",
        "text": "Lorem ipsum dolor sit amet"
      }
    ]
  }
]

Useful tips:

Initially the deploy script creates an empty configuration
You can upload the HITs during configuration step 6

Automatic HITs creation

TODO

Task Performing

How a crowdsourcing task is launched depends on how the workers are recruited. You can recruit each worker manually, or on one of the crowdsourcing platforms supported.

Manual Recruitment

Assign to each worker a workerID:
- It is used to identify each worker ;
- It enables data collection when the worker performs the task;
Append the id as a GET parameter ?workerID=worker_id_chosen
Provide the full URL to the worker:

https://your_deploy_bucket.s3.your_aws_region.amazonaws.com/your_task_name/your_batch_name/index.html?workerID=worker_id_chosen

Amazon Mechanical Turk

TODO

Prolific

TODO

Toloka

TODO

Results Download

Move to project data folder:
```
cd ~/path/to/project/data/
```
Run python script download.py
Move to the results folder:
```
cd result
```
Move to the current task folder your_task_name:
```
cd your_task_name
```
Results structure:

TODO

Local Development

You may want to edit and test the task configuration locally. To enable local development:

Move to enviroments folder:

cd your_repo_folder/data/build/environments

Open the dev environment file:
```
environment.ts
```

Set the configuration_local flag to true:

Full sample:

export const environment = {
    production: false,
    configuration_local: true,
    platform: 'mturk',
    taskName: "your_task_name",
    batchName: "your_batch_name",
    region: "your_aws_region",
    bucket: "your_private_bucket",
    aws_id_key: "your_aws_key_id",
    aws_secret_key: "your_aws_key_secret",
    prolific_completion_code: false,
    bing_api_key: "your_bing_api_key",
    fake_json_key: "your_fake_json_key",
    log_on_console: false,
    log_server_config: "none",
    table_acl_name: "Crowd_Frame-your_task_name_your_batch_name_ACL",
    table_data_name: "Crowd_Frame-your_task_name_your_batch_name_Data",
    table_log_name: "Crowd_Frame-your_task_name_your_batch_name_Logger",
    hit_solver_endpoint: "None",
};

Now you can manually edit the configuration and test everything locally.

⚠️ Remember: each execution of the init.py script will overwrite this file ⚠️

Troubleshooting

Fixes for well-known errors:

The docker package, as of today, triggers the exception shown below on certain Windows-based python distributions because the pypiwin32 dependency fails to run its post-install script. NameError: name 'NpipeHTTPAdapter' is not defined. Install pypiwin32 package to enable npipe:// support . To solve it run the following command from an elevated command prompt: python your_python_folder/Scripts/pywin32_postinstall.py -install.

Contributing

Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/dev-brach)
Commit your Changes (git commit -m 'Add some Feature')
Push to the Branch (git push origin feature/dev-branch)
Open a Pull Request

Original Article

This software has been presented during The 15th ACM International WSDM Conference.

@inproceedings{conference-paper-wsdm2022,
    author = {Soprano, Michael and Roitero, Kevin and Bombassei De Bona, Francesco and Mizzaro, Stefano},
    title = {Crowd_Frame: A Simple and Complete Framework to Deploy Complex Crowdsourcing Tasks Off-the-Shelf},
    year = {2022},
    isbn = {9781450391320},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3488560.3502182},
    doi = {10.1145/3488560.3502182},
    abstract = {Due to their relatively low cost and ability to scale, crowdsourcing based approaches are widely used to collect a large amount of human annotated data. To this aim, multiple crowdsourcing platforms exist, where requesters can upload tasks and workers can carry them out and obtain payment in return. Such platforms share a task design and deploy workflow that is often counter-intuitive and cumbersome. To address this issue, we propose Crowd_Frame, a simple and complete framework which allows to develop and deploy diverse types of complex crowdsourcing tasks in an easy and customizable way. We show the abilities of the proposed framework and we make it available to researchers and practitioners.},
    booktitle = {Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining},
    pages = {1605–1608},
    numpages = {4},
    keywords = {framework, crowdsourcing, user behavior},
    location = {Virtual Event, AZ, USA},
    series = {WSDM '22}
}

rik1599 / Crowd_Frame

Crowd_Frame

A software system that allows to easily design and deploy diverse types of crowdsourcing tasks.

Table of Contents

Prerequisites

Getting Started

Environment Variables

Task Configuration

Step Overview

Step 1 - Questionnaires

Step 2 - Evaluation Dimensions

Step 3 - Task Instructions

Step 4 - Evaluation Instructions

Step 5 - Search Engine

Step 6 - Task Settings

Step 7 - Worker Checks

Task Testing

HITs Format

Automatic HITs creation

Task Performing

Manual Recruitment

Amazon Mechanical Turk

Prolific

Toloka

Results Download

Local Development

Troubleshooting

Contributing

Original Article

About

Languages