This repository contains information about Data Analysis, including the course objectives, policies and the schedule. Please check this repository regularly for updates to the policies and the schedule.
- SCHEDULE
- Instructor
- Meeting Times
- Discord
- Course Calendar
classDocs/
- Deliverable
- Course Description
- Distribution Requirements
- Suggested Textbooks
- Course Policies
- Grading
- Assignment Submissions
- Gradebook Repository
- Schedule
Dr. Oliver Bonham-Carter (Note: said and written as Bonham-Carter, not Carter)
Office Location: Alden Hall 105
Email: obonhamcarter@allegheny.edu
Session | Day | Time | Location |
---|---|---|---|
Class | Tuesday | 9:30am - 10:45am | Alden 101 |
Class | Thursday | 9:30am - 10:45am | Alden 101 |
Lab | Monday | 2:30pm - 4:20pm | Alden 101 |
If you are already on the department's Discord server, then you will be given access to the course's Discord channel, called #data-analytics
. If not, then you will need to join the department's Discord server before you can be added to the course's channel.
All materials given out in class will be accessible using the classDocs/
repository. Note: The HTTP link works in absence of SSH keys.
Main site on GitHub: ClassDocs/
- Exam code "C"
- Due: 4th May 2023, 7:00pm
Credits: 4
A team-based investigation of select topics in computer science, preparing students for the proposal and completion of a senior project. Working in teams to complete hands-on activities, students learn how to read research papers, state and motivate research questions, design and conduct experiments, and collect and organize evidence for evaluating scientific hypotheses. During a weekly laboratory session students use state-of-the-art technology to gain practical skills in scientific and technical writing, the presentation of computational and mathematical concepts, and the visualization of experimental data. Students are invited to use their own departmentally approved laptop in this course; a limited number of laptops are available for use during class and lab sessions. Prerequisite: CMPSC*101 and at least one of the core courses. Distribution Requirements: None.
Students successfully completing this class will have developed:
- A “big-picture” view of data analytics.
- An understanding of the objectives and limitations of data analytics.
- An understanding of the main data analytics methods.
- Practical skills using relevant software tools and programming techniques.
- An understanding of the contemporary roles of power and difference as they relate to the knowledge derived from a data set.
- An understanding of biases, discrimination and stereotypes that maybe present during collection, analysis, and reflection on the latent trends in real-world data sets.
- The course is divided into modules, with several of the modules consisting of investigations of real-world data in a specific field. In addition to learning specific technical and programming skills in each module students will be required to read a relevant article and prepare for a discussion related to the issues raised in the article.
- Students will also enhance their ability to write and present ideas about data analytics in a clear and compelling fashion. Finally, students will gain practical experience in the design, implementation, and analysis of data for research during laboratory sessions and a final project.
Throughout the semester students will be challenged with serious analytical questions connecting the investigated data and its analysis to arising societal issues of bias, ethical consideration and the culture of power. This step is to ensure that analytics is performed with a lens on the data, as well as its impacts (positive and negative) on culture, community, and society. We note here that there is often no clear indication of a “correct” decision as a result of an analysis of data. The so-called “right” decision ought to be made by analysis who has studied both the data, and the consequences of decision in terms of humanitarian, environmental, ecological and other factors. This class cannot give you the correct decision, however it can help to enable your critical thinking skills which will provide you with some understanding of how to navigate to worthy decisions.
-
Wickham, Hadley, and Garrett Grolemund. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data., O'Reilly Media, Inc., 2016.
-
Julia Silge And David Robinson. Text Mining With R: A Tidy Approach., O'Reilly Media, Inc., 2019.
-
Think Python, first edition, by Allen B. Downey.
-
BUGS in Writing: A Guide to Debugging Your Prose (Second Edition). Lyn Dupr'e. Addison-Wesley Professional. ISBN-10: 020137921X and ISBN-13: 978-0201379211, 704 pages, 1998. References to the textbook are abbreviated as "BIW".
-
Writing for Computer Science (Second Edition). Justin Zobel. Springer ISBN-10: 1852338024 and ISBN-13:978-1852338022, 270 pages, 2004. References to the textbook are abbreviated as "WFCS".
The grade that a student receives in this class will be based on the following categories. All percentages are approximate and, if the need to do so presents itself, it is possible for the assigned percentages to change during the academic semester.
Category | Percentage | Assessment metric |
---|---|---|
Class Participation | 10% | check mark grade |
Exam | 20% | letter grade |
Lab Assignments | 40% | letter grade |
Final Project | 30% | letter grade |
Total | 100% |
Letter | Range | Letter | Range | Letter | Range |
---|---|---|---|---|---|
A | 96 - 100 | A- | 90 - 95.9 | ||
B+ | 87 - 89.9 | B | 83 - 86.9 | B- | 80 - 82.9 |
C+ | 77 - 79.9 | C | 73 - 76.9 | C- | 70 - 72.9 |
D+ | 67 - 69.9 | D | 63 - 66.9 | F | 59.9 and below |
-
Class Participation: All students are required to actively participate during all of the class sessions. Your participation will take forms such as answering questions about the required reading assignments, completing in-class exercises, asking constructive questions of the other members of the class, giving presentations, leading a discussion session in class.
-
Exam: A midterm exam will cover all of the material in their associated module(s) up to the time of the exam. The finalized date for each of the exams will be announced at least one week in advance of the scheduled date. Unless prior arrangements are made with the course instructor, in absence of special arrangements, all students will be expected to take the exam at the scheduled date and complete it during the allotted time.
-
Laboratory Assignments: These assignments invite students to explore the concepts, tools, and techniques associated with the analysis of data. All of the laboratory assignments require the use of the provided tools to study, design, implement, and evaluate systems that solve data analytics problems. In addition to demonstration of the technical skills through the utilized or developed software for data analysis, some of the laboratory assignments in this course may also expect students to read a related article and to lead a discussion or to give a short presentation related to the assigned article.
-
Final Project: This project will present you with the description of a problem and ask you to implement a full-featured solution using a wide variety of data analytics techniques. The final project in this class will require you to apply all of the knowledge and skills that you have accumulated during the course of the semester to solve a problem and, whenever possible, make your solution publicly available as a free and open-source tool. The project will invite you to draw upon both your problem solving skills and data analytics techniques.
Your instructor will be using GitHub Classroom to collect all assignments. It is expected that you are able to effectively use git
to submit your work. If you require help, please see your peers, the Technology Leaders, or your instructor.
The three basic commands for submitting work are the following.
git add -A
git commit -m "informative message"
git push
All assignments will have a stated due date. The electronic version of the class assignments are to be turned in at the beginning of the lab session on the due date. Submissions after the beginning of class are counted as being late.
Assignments will be accepted for up to one week past the assigned due date with a 15% penalty. After that time, the lab will not be accepted.
Please note, lab solutions will be discussed after the one week late submission deadline. Therefore, any submission of the assignments following that deadline will not be possible.
Unless special arrangements are made with the course instructor, no assignments will be accepted after the late deadline. If you are requesting extensions for a lab assignment, then you are to email me with your request and also provide a valid reason for your extension. This request must come before the due date of the lab and not on the due date.
The decision to provide you with an extension (or not) will be weighed in light of fairness to your peers who are still able to complete their labs, regardless of their own busy schedules.
Various digital channels will be used in this course for communication, including email, Discord, and the GitHub issue tracker. It is strongly advised for the student to install the Discord app on their computer and smart-phone to be sure to receive all communications from the instructor, as well as, the other members of the class.
Additionally, the course website will be used to store the syllabus, course schedule and information about the classDocs/
repository using the GitHub. Your grades will be communicated to you by a Gradebook GitHub repository.
The classrooms in the Department of Computer Science no longer provide machines for student use. You are to bring your own wifi-ready device to class to be able to follow along with course material. If the class is meeting online using Zoom, then please be sure that you machine is configured correctly to use these services to connect you to the class. As it is your responsibility to maintain your machine, please perform online research to determine how to configure your machine accordingly, or to install any necessary software to enable online meetings.
During the semester, you will be told which software to install on your machine to be prepared for class. Some of the prominent software that we may be using can be found at the following resource.
The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. Students with disabilities who believe they may need accommodations in this class are encouraged to contact Disability Services at 332-2898. Disability Services is part of the Learning Commons and is located in Pelletier Library. Please do this as soon as possible to ensure that approved accommodations are implemented in a timely fashion.
Some of the resources on campus are listed below.
- Maytum Center for Student Success
- Allegheny College Counseling Center
- The Winslow Health Center
- Student Life
The Academic Honor Program that governs the entire academic program at Allegheny College is described in the Allegheny Course Catalogue. The Honor Program applies to all work that is submitted for academic credit or to meet non-credit requirements for graduation at Allegheny College. This includes all work assigned for this class (e.g., examinations, laboratory assignments, and the final project). All students who have enrolled in the College will work under the Honor Program. Each student who has matriculated at the College has acknowledged the following pledge:
I hereby recognize and pledge to fulfill my responsibilities, as defined in the Honor Code, and to maintain the integrity of both myself and the College community as a whole.
It is recognized that an important part of the learning process in any course, and particularly one in computer science, derives from thoughtful discussions with teachers and fellow students. Such dialogue is encouraged. However, it is necessary to distinguish carefully between the student who discusses the principles underlying a problem with others and the student who produces assignments that are identical to, or merely variations on, someone else's work. While it is acceptable for students in this class to discuss their programs, technical diagrams, proposals, paper reviews, presentations, and other items with their classmates or other individuals, deliverables that are nearly identical to the work of others will be taken as evidence of violating the Honor Code.
More information about the code may be found at the Maytum Center for Student Success
Types of computer hardware and software are everywhere! Conducting research in computer science is a challenging and rewarding activity that leads to the production of hardware, software, and scientific insights that have the potential to positively influence the lives of many people. As you learn more about research methods in computer science you will also enhance your ability to effectively write and speak about a wide range of topics in computer science. I ask that you bring your best effort and highest enthusiasm as you pursue research in computer science this semester.
Below is a schedule of covered topics as we cover them, along with their associated activities.
Week # | Dates | Topic | Reading |
---|---|---|---|
1 | 17 - 20 Jan 2023 | Introduction to the course, the data all around | Read the syllabus |
1 | No lab | N/A | |
2 | 23 - 27 Jan | Web traffic analytics. Google Analytics, Building own site to play with Google Analytics | Slides |
2 | Lab assignment 01 Note: you will have to refresh your GitHub page after clicking on this link. Activity01 | Hand out | |
3 | 30 Jan - 3 Feb | Introduction to R programming, exploratory steps | slides |
3 | Website Analytics Lab assignment 02 | ||
4 | |||
4 |