R Ladies Melbourne, 29 Nov 2022
An illustration of reasons why we should care about working reproducibly from The Turing Way, Guide for Reproducible Research
{targets}
, {renv}
, {lintr}
, {styler}
…We’re awash in information! What we need is curation.
“Like families, tidy datasets are all alike but every messy dataset is messy in its own way” - {tidyr}: Tidy data
“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.” - The tidyverse style guide
Jump in here:
Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want the computer to do. – (Knuth 1984)
From the Modern Data Book by Martin Shepperd:
Start with:
and then maybe:
Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. - R Packages (2e)
From simple to more involved:
A version control system, or VCS, tracks the history of changes as people and teams collaborate on projects together. As developers make changes to the project, any earlier version of the project can be recovered at any time.
Some good starting points:
git
, github
Experimenting with more advanced features:
“The command line is a tool for talking to your operating system (e.g., macOS, Windows, etc.) using text instead of by moving around a mouse and clicking on things”
- The Command Line from Practical Data Science by Nick Eubank
Dip your toes in with:
Then dive deeper…
bash
, zsh
, Terminal
, shell
From The Turing Way, Guide for Collaboration:
The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.
Some (often) good enough tools:
Some R specific resources:
Ways of capturing computational environments from The Turing Way, Guide for Reproducible Research
Possible starting points:
A pipeline is a computational workflow that does statistics, analytics, or data science… A pipeline contains tasks to prepare datasets, run models, and summarize results for a business deliverable or research paper.
On my to-explore list:
The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.
Motivation and guidance on testing:
Special mentions to:
Find me @cynthiahqy on:
Some shameless plugs:
{conformr}
, an opinionated toolkit for data harmonisationR-Ladies theme for Quarto Presentations. Code available on GitHub.