EducationalTestingService / ies-writing-achievement-study-data

Data from an IES research study that explores the relationship between writing achievement and success at 4-year postsecondary institutions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Overview

The data presented in this repository were collected as part of a research project entitled Exploring Writing Achievement and Its Role in Success at 4-Year Postsecondary Institutions. This project was funded by the Institute of Education Science, U.S. Department of Education, Award Number R305A160115, and led by Dr. Jill Burstein (Principal Investigator) and Dr. Daniel McCaffrey (co-Principal Investigator).

Description

The repository contains the following main set of files:

  • writing-samples/*.txt: De-identified authentic university coursework writing data. There are 997 files in this directory, each representing one of the coursework assignments. 735 students participated in this study. A partially overlapping subset of students (N=418) submitted multiple coursework writing assignments.

    Participants were enrolled at 4-year universities, and writing assignments were collected from courses primarily targeting first-year students. We refer to these files as the University Coursework Writing Corpus. All assignment files are plaintext and UTF-8 encoded, where necessary.

  • student_data.csv : Data collected from student participants. Two types of data are collected:

    • Writing Attitude Survey Data: This survey measured four components of writing attitudes and beliefs: (1) Goals for Writing, (2) Confidence about Writing Tasks, (3) Beliefs about Writing, and (4) Feelings about Writing. Note that the order of survey questions in this CSV file is not the same as the original order of the questions in the survey. The survey was completed by 566 out of 735 total study participants. For the others, these columns are blank.

    • Outcomes/Success Predictor Measures: A subset of outcomes/success predictor measures for the study participants, including: (1) their course grade (for the course in which writing assignments and surveys were submitted), (2) their study semester GPA, (3) their semester GPA for up to five semesters following study enrollment, and (4) their SAT Total/ACT Composite score (recoded as SAT Total score).

    In total, there are 735 rows and 64 columns in this CSV file. A detailed description of each column can be found here.

  • writing_features.csv : Various features based on the writing samples in the writing corpus. The following types of data are included:

    • Assignment Preparation Survey Data: Survey responses (N=929) collected from students regarding each coursework assignment they submitted.

    • Genre Annotations: Human annotations for each assignment pertaining to assignment type, source requirements, source use, writing aim, and assignment version.

    • Automated Writing Evaluation (AWE) Features: Feature values generated for each of the 997 writing assignments from two different automated writing evaluation (AWE) systems. The features are computed based on grammar and mechanics errors, use of figurative & argumentative language, and vocabulary, among others. The two different AWE systems used to generate these features were e-Rater (Attali & Burstein, 2006) and Writing Mentor (Burstein et al., 2018)

    In total, there are 997 rows and 119 columns in this CSV file. A detailed description of each column can be found here.

Additional Documents

The following additional files can be found under docs.

  • student_data_columns.csv : A CSV file describing in detail each of the columns found in student_data.csv.

  • writing_features_columns.csv : A CSV file describing in detail each of the columns found in writing_features.csv.

  • surveys/assignment_preparation_survey.pdf : The instrument used for the assignment preparation survey.

  • surveys/writing_attitudes_survey.pdf : The instrument used for the writing attitudes survey. It is adapted from an instrument developed by MacArthur, Philippakos, & Graham (2016) to measure motivation among college writers.

  • forms/*.pdf : Blank copies of the student consent forms issued to the student participants.

  • processes/deidentification_procedures.pdf : This file contains a description of the steps that we followed in order to de-identify (remove any personally identifying information from) the writing samples and student metadata.

  • processes/genre_annotation.pdf : This file contains a description of the genre annotation performed to classify writing assignments based on broad assignment types.

  • processes/persuasive_subgenre_annotation.pdf : This file contains a description of the subgenre annotation performed to classify persuasive writing assignments into one of six finer-grained categories based on argument value (on a continuum of low to high), source use and integration, and support.

Related Publications & Presentations

License

Creative Commons License
This data is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Contact Us

If you have any questions about the data, please send us an email.

Acknowledgements

The data provided in this repository was collected via research supported by the Educational Testing Service, and the Institute of Education Science, U.S. Department of Education, Award Number R305A160115.

Thanks to Michael Flor, Binod Gyawali, Ben Leong, and Maxwell Schwartz for engineering support. Many thanks to our research assistants, Patrick Houghton, Hillary Molloy and Zydrune Mladineo, for managing a complex data collection.

About

Data from an IES research study that explores the relationship between writing achievement and success at 4-year postsecondary institutions.

License:Other