BillOchieng / classDocs

Materials for course

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Analysis (CMPSC 301) Course Syllabus

This repository contains information about Data Analysis, including the course objectives, policies and the schedule. Please check this repository regularly for updates to the policies and the schedule.

Content

Instructor

Dr. Oliver Bonham-Carter (Note: said and written as Bonham-Carter, not Carter)

Office Location: Alden Hall 105

Email: obonhamcarter@allegheny.edu

Office Hours

Meeting Times

Session Day Time Location
Class Tuesday 9:30am - 10:45am Alden 101
Class Thursday 9:30am - 10:45am Alden 101
Lab Monday 2:30pm - 4:20pm Alden 101

Discord

If you are already on the department's Discord server, then you will be given access to the course's Discord channel, called #data-analytics. If not, then you will need to join the department's Discord server before you can be added to the course's channel.

Calendar

Course Calendar Link

classDocs/

All materials given out in class will be accessible using the classDocs/ repository. Note: The HTTP link works in absence of SSH keys. Main site on GitHub: ClassDocs/

Course Deliverable

  • Exam code "C"
  • Due: 4th May 2023, 7:00pm

Course Description

Credits: 4

A team-based investigation of select topics in computer science, preparing students for the proposal and completion of a senior project. Working in teams to complete hands-on activities, students learn how to read research papers, state and motivate research questions, design and conduct experiments, and collect and organize evidence for evaluating scientific hypotheses. During a weekly laboratory session students use state-of-the-art technology to gain practical skills in scientific and technical writing, the presentation of computational and mathematical concepts, and the visualization of experimental data. Students are invited to use their own departmentally approved laptop in this course; a limited number of laptops are available for use during class and lab sessions. Prerequisite: CMPSC*101 and at least one of the core courses. Distribution Requirements: None.

Course Objectives

Students successfully completing this class will have developed:

  • A “big-picture” view of data analytics.
  • An understanding of the objectives and limitations of data analytics.
  • An understanding of the main data analytics methods.
  • Practical skills using relevant software tools and programming techniques.
  • An understanding of the contemporary roles of power and difference as they relate to the knowledge derived from a data set.
  • An understanding of biases, discrimination and stereotypes that maybe present during collection, analysis, and reflection on the latent trends in real-world data sets.
  • The course is divided into modules, with several of the modules consisting of investigations of real-world data in a specific field. In addition to learning specific technical and programming skills in each module students will be required to read a relevant article and prepare for a discussion related to the issues raised in the article.
  • Students will also enhance their ability to write and present ideas about data analytics in a clear and compelling fashion. Finally, students will gain practical experience in the design, implementation, and analysis of data for research during laboratory sessions and a final project.

An Ethical Interest

Throughout the semester students will be challenged with serious analytical questions connecting the investigated data and its analysis to arising societal issues of bias, ethical consideration and the culture of power. This step is to ensure that analytics is performed with a lens on the data, as well as its impacts (positive and negative) on culture, community, and society. We note here that there is often no clear indication of a “correct” decision as a result of an analysis of data. The so-called “right” decision ought to be made by analysis who has studied both the data, and the consequences of decision in terms of humanitarian, environmental, ecological and other factors. This class cannot give you the correct decision, however it can help to enable your critical thinking skills which will provide you with some understanding of how to navigate to worthy decisions.

Suggested TextBooks

  • Wickham, Hadley, and Garrett Grolemund. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data., O'Reilly Media, Inc., 2016.

  • Julia Silge And David Robinson. Text Mining With R: A Tidy Approach., O'Reilly Media, Inc., 2019.

  • Think Python, first edition, by Allen B. Downey.

Other Useful Textbooks:

  • BUGS in Writing: A Guide to Debugging Your Prose (Second Edition). Lyn Dupr'e. Addison-Wesley Professional. ISBN-10: 020137921X and ISBN-13: 978-0201379211, 704 pages, 1998. References to the textbook are abbreviated as "BIW".

  • Writing for Computer Science (Second Edition). Justin Zobel. Springer ISBN-10: 1852338024 and ISBN-13:978-1852338022, 270 pages, 2004. References to the textbook are abbreviated as "WFCS".

Policies

Grading

The grade that a student receives in this class will be based on the following categories. All percentages are approximate and, if the need to do so presents itself, it is possible for the assigned percentages to change during the academic semester.

Category Percentage Assessment metric
Class Participation 10% check mark grade
Exam 20% letter grade
Lab Assignments 40% letter grade
Final Project 30% letter grade
Total 100%

Grading Scale

Letter Range Letter Range Letter Range
A 96 - 100 A- 90 - 95.9
B+ 87 - 89.9 B 83 - 86.9 B- 80 - 82.9
C+ 77 - 79.9 C 73 - 76.9 C- 70 - 72.9
D+ 67 - 69.9 D 63 - 66.9 F 59.9 and below

Definitions of Grading Categories

  • Class Participation: All students are required to actively participate during all of the class sessions. Your participation will take forms such as answering questions about the required reading assignments, completing in-class exercises, asking constructive questions of the other members of the class, giving presentations, leading a discussion session in class.

  • Exam: A midterm exam will cover all of the material in their associated module(s) up to the time of the exam. The finalized date for each of the exams will be announced at least one week in advance of the scheduled date. Unless prior arrangements are made with the course instructor, in absence of special arrangements, all students will be expected to take the exam at the scheduled date and complete it during the allotted time.

  • Laboratory Assignments: These assignments invite students to explore the concepts, tools, and techniques associated with the analysis of data. All of the laboratory assignments require the use of the provided tools to study, design, implement, and evaluate systems that solve data analytics problems. In addition to demonstration of the technical skills through the utilized or developed software for data analysis, some of the laboratory assignments in this course may also expect students to read a related article and to lead a discussion or to give a short presentation related to the assigned article.

  • Final Project: This project will present you with the description of a problem and ask you to implement a full-featured solution using a wide variety of data analytics techniques. The final project in this class will require you to apply all of the knowledge and skills that you have accumulated during the course of the semester to solve a problem and, whenever possible, make your solution publicly available as a free and open-source tool. The project will invite you to draw upon both your problem solving skills and data analytics techniques.

Assignment Submissions

Your instructor will be using GitHub Classroom to collect all assignments. It is expected that you are able to effectively use git to submit your work. If you require help, please see your peers, the Technology Leaders, or your instructor.

The three basic commands for submitting work are the following.

git add -A
git commit -m "informative message"
git push

Late Submissions

All assignments will have a stated due date. The electronic version of the class assignments are to be turned in at the beginning of the lab session on the due date. Submissions after the beginning of class are counted as being late.

Assignments will be accepted for up to one week past the assigned due date with a 15% penalty. After that time, the lab will not be accepted.

Please note, lab solutions will be discussed after the one week late submission deadline. Therefore, any submission of the assignments following that deadline will not be possible.

Extensions

Unless special arrangements are made with the course instructor, no assignments will be accepted after the late deadline. If you are requesting extensions for a lab assignment, then you are to email me with your request and also provide a valid reason for your extension. This request must come before the due date of the lab and not on the due date.

The decision to provide you with an extension (or not) will be weighed in light of fairness to your peers who are still able to complete their labs, regardless of their own busy schedules.

Communication

Various digital channels will be used in this course for communication, including email, Discord, and the GitHub issue tracker. It is strongly advised for the student to install the Discord app on their computer and smart-phone to be sure to receive all communications from the instructor, as well as, the other members of the class.

Additionally, the course website will be used to store the syllabus, course schedule and information about the classDocs/ repository using the GitHub. Your grades will be communicated to you by a Gradebook GitHub repository.

Gradebook Repository

Bring your own computer to class

The classrooms in the Department of Computer Science no longer provide machines for student use. You are to bring your own wifi-ready device to class to be able to follow along with course material. If the class is meeting online using Zoom, then please be sure that you machine is configured correctly to use these services to connect you to the class. As it is your responsibility to maintain your machine, please perform online research to determine how to configure your machine accordingly, or to install any necessary software to enable online meetings.

During the semester, you will be told which software to install on your machine to be prepared for class. Some of the prominent software that we may be using can be found at the following resource.

Special Needs and Disability Services

The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. Students with disabilities who believe they may need accommodations in this class are encouraged to contact Disability Services at 332-2898. Disability Services is part of the Learning Commons and is located in Pelletier Library. Please do this as soon as possible to ensure that approved accommodations are implemented in a timely fashion.

Some of the resources on campus are listed below.

Honor Code

The Academic Honor Program that governs the entire academic program at Allegheny College is described in the Allegheny Course Catalogue. The Honor Program applies to all work that is submitted for academic credit or to meet non-credit requirements for graduation at Allegheny College. This includes all work assigned for this class (e.g., examinations, laboratory assignments, and the final project). All students who have enrolled in the College will work under the Honor Program. Each student who has matriculated at the College has acknowledged the following pledge:

I hereby recognize and pledge to fulfill my responsibilities, as defined in the Honor Code, and to maintain the integrity of both myself and the College community as a whole.

It is recognized that an important part of the learning process in any course, and particularly one in computer science, derives from thoughtful discussions with teachers and fellow students. Such dialogue is encouraged. However, it is necessary to distinguish carefully between the student who discusses the principles underlying a problem with others and the student who produces assignments that are identical to, or merely variations on, someone else's work. While it is acceptable for students in this class to discuss their programs, technical diagrams, proposals, paper reviews, presentations, and other items with their classmates or other individuals, deliverables that are nearly identical to the work of others will be taken as evidence of violating the Honor Code.

More information about the code may be found at the Maytum Center for Student Success

Welcome to Computer Science Research!

Types of computer hardware and software are everywhere! Conducting research in computer science is a challenging and rewarding activity that leads to the production of hardware, software, and scientific insights that have the potential to positively influence the lives of many people. As you learn more about research methods in computer science you will also enhance your ability to effectively write and speak about a wide range of topics in computer science. I ask that you bring your best effort and highest enthusiasm as you pursue research in computer science this semester.

Schedule

Below is a schedule of covered topics as we cover them, along with their associated activities.

Week # Dates Topic Reading
1 17 - 20 Jan 2023 Introduction to the course, the data all around Read the syllabus
1 No lab N/A
2 23 - 27 Jan Web traffic analytics. Google Analytics, Building own site to play with Google Analytics Slides
2 Lab assignment 01 Note: you will have to refresh your GitHub page after clicking on this link. Activity01 Hand out
3 30 Jan - 3 Feb Introduction to R programming, exploratory steps slides
3 Website Analytics Lab assignment 02
4
4

About

Materials for course


Languages

Language:R 100.0%