- Reflect on your interests in language and linguistics
- Review the concepts on identifying a research problem and developing a research statement in TAD Chapter 4 "Framing reseaerch"
- Review the concepts on curating corpus data in TAD Chapter 2 "Understanding data" and implementing curation steps in TAD Chapter 6 "Curate data"
- Continue to refine your project statements and data acquisition implementation.
- Review the steps for working with RStudio, Git, and GitHub in Recipe #5 and Lab #5.
- Continue to refine your text analysis project
- Add your data curation implementation strategy to your project
- Apply the edit, add, commit, push workflow to update your GitHub page
- Continue to apply your growing knowledge of R
- Open your research project (
project_<your_last_name>
) on RStudio Cloud. - Open the
2_curate_dataset.Rmd
file in theanalysis/
directory.
In this project implementation step you will be working to refine the process by which you will curate the data you will use in your text analysis project. You will build on the process you are adopting for acquiring the data for your text analysis project and move to create a tidy dataset that contains the main structural characteristics required for subsequent steps in your analysis (transform and analysis).
As with the 1_acquire_data.Rmd
file, the 2_curate_dataset.Rmd
file and the other subsequent files in the analysis/
directory of your project have a similar template structure in the prose section of the .Rmd document. This includes 'About' , 'Setup', 'Run', and 'Finalize' sections with some relevant subsections. This structure is not set in stone and merely helps you start to think about the steps that your processing and documentation should most likely include. Each of the (sub)sections has a comment (<!-- ... -->
) to guide you as to what you might expect to add in these sections.
Tasks
- Provide a description that overviews the aim of this script (.Rmd).
- Include any information necessary for someone to know to be able to reproduce this script.
- Load the appropriate packages that you will need.
- Include the relevant code to curate, store, and document the data on your disk.
- Make sure that your data is stored on disk in a plain-text format (most likely
.csv
) inside thedata/derived/
directory. - Source the
_pipeline.R
file. Either through:
- opening the
_pipeline.R
file and clicking the 'Source' button. - or, in the R Console running
source("_pipeline.R")
Each step of your project should be seen as modular --that is, that one script should not directly depend on the next in the sequence. By storing the relevant data necessary for the next step in the data/derived/
directory you are ensuring that no object(s) from the 2_curate_dataset.Rmd
R session is required for the next step 3_transform_dataset.Rmd
, and so on.
Note: The exact data curation steps will likely change in the coming weeks, but it is important to make a concerted effort to develop the scaffolding (even if only psuedocode). We will workshop the ideas that surface from the work on this project orientation step and discuss collective and individual questions to continue to develop and hone our projects so that they show the most promise for being viable.
- Now run the Git commands in the Terminal to add, commit, and push your updates to your GitHub repository/ website.
git status
git add -A
git commit -m '<briefly describe what you've done>'
git push
Remember: your PAT has probably expired (it only lasts 12 hours in RStudio Cloud) so you will need to run gitcreds::gitcreds_set()
in the R Console and paste in your PAT before you run git push
(otherwise you will get cryptic error).
- Navigate to your GitHub website (where the web page shows) and copy the link to the main page where you have added your changes.
- Go to the Canvas submission page for "Project implementation #3" and paste this URL link into the 'website url' submission field. In the 'Text Entry' field add your self-assessment comments.
IMPORTANT NOTE:
We will continue using GitHub to post our work. In addition to saving our project work and hosting a website for our projects, GitHub also has a robust set of features for commenting and collaborating on code. We will use these features to receive and ask for feedback on our code and ideas.