arnavgrg / Microsoft-X-PiE-Text-Analytics-Datathon-2020

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UC Berkeley Datathon

September 12, 2020

This workshop was designed for UC Berkeley's PiE Datathon, sponsored by Microsoft.

We introduce you to Microsoft Azure's cognitive services using two datasets. In this repository, you'll find the materials to conduct text analytics on one of two datasets using the Cognitive Services Text Analytics API.

The slides used for the demo are included in this repository.

Setup Requirements

Setup

Import requests

Run the command pip install --upgrade requests in your terminal.

Add import requests at the top of your Python file in order to call the Azure text analytics API.

Create a Text Analytics resource in the Azure Portal

It is recommended that you create a Text Analytics resource for your project (team members can share an API key). You are welcome to incorporate other Azure Cognitive Services as appropriate for the scope of your project!

Learn more about the Text Analytics API and what it offers in the Azure documentation.

Creating an Azure resource

Follow this tutorial for creating a resource in Azure and accessing its API key.

When creating your resource, set the following configuration:

  • Subscription: your Azure student subscription
  • Location: this depends on your team's general location. We recommend selecting "(US) West US" from the dropdown menu if you are on the west coast or "(US) East US" if you are on the east coast.
  • Pricing Tier: Standard S if available or Free
  • Resource group: Create a new resource group that will contain all of your Azure resources

The datasets

Option 1: Seattle Police Dataset

Description: "Records of Officer Involved Shootings (OIS) from 2005 to the present, including a brief narrative synopsis."

More information about this dataset: Seattle Police Data

Option 2: Customer Service Data

Description: These are records of support tickets that a fictitious hardware general store has received in the form of emails or audio converted to text.

Column descriptions

  • SupportTicket: unique ID of support issue received
  • CustomerID: unique ID of customer submitting issue
  • DateCreated: date of when issue was submitted and received
  • DateCompleted: date of closing the support ticket
  • Escalated: binary variable of whether a ticket was escalated to another team due to urgency (0 - no, 1 - yes)
  • Theme: category of support ticket
  • Text: text of customer ticket

Theme descriptions

  • price: cost of the goods to customers, comparison to competitors, sales, discounts
  • speed: delivery speed
  • features: product or website features
  • design: product or website design
  • reliability: product, website, and delivery reliability
  • support: tech, customer, and product support
  • security: payment info, login info
  • services: delivery, account, payment, and installation services
  • other

Created by the Microsoft OCP Azure Data & AI Team

About


Languages

Language:Jupyter Notebook 99.5%Language:Python 0.5%Language:Shell 0.0%