Lab Wiki
  • Welcome!
  • General Lab Info
    • The Lab
      • Who are we? (People)
      • Where are we? (Location)
      • What do we do? (Research)
      • Current Studies
        • Bilingual Families Project
        • Early Comprehension Study
        • Trilingualism Project
        • Referential Continuity Tablet
        • Language Awareness and Separation Questionnaire
      • Lab lingo (Dictionary)
      • Lab Core Values
      • Online Systems
    • Covid-19 Protocols
  • In the Lab
    • Getting started (New lab members)
      • Onboarding
      • Moving to Montreal
        • From outside of Canada
          • Visas
            • Master's / PhD level
            • Post-Doc
            • Visas for significant others
          • Health Insurance (non-Canadians)
          • Driver’s license
          • Taxes (newcomers to Canada)
        • From other provinces to Quebec
          • Health Insurance
          • Taxes
        • Once you arrive
          • Banking
          • Housing
          • Health
          • Utilities
          • Student ID card
          • Social Insurance Number (SIN)
        • More life in Montreal
          • Extending CAQ / Study Permit
          • Moving
          • Neighbourhoods
      • Lab expectations
      • Transportation
      • Academic Survival Guide
      • New Member Paperwork
      • Website Profile
      • Lab Access
      • Subscribing to ListServs
      • Lab conventions
      • Time off
      • Concordia
      • Meeting with Krista
      • New Member Orientation
    • Lab Space
    • Mentorship
      • Grad students and postdocs
      • Undergraduate project students
      • Undergraduate RAs
      • Undergraduate volunteers
    • Funding
      • Undergraduates
        • CUSRA and USRA
        • Bursaries, Scholarships and Awards
        • New FRQS Undergraduate Award
      • Doctoral
        • PBEEE (Merit Scholarship)
        • CRBLM (Graduate Stipend)
        • Tri-Council (NSERC, SSHRC, CIHR)
        • FRQSC
        • Mitacs accelerate
        • MacKenzie King Scholarship
      • Postdoctoral
        • PBEEE (Merit Scholarship)
        • FRQSC
        • Tri-Council (NSERC, SSHRC, CIHR)
      • TAship/Course Instructor
        • TA
        • Course Instructor
      • Expense Reimbursement
    • Conferences
      • Conferences to consider
    • File Naming Protocol
  • Testing Plans
    • Bilingual Family Project
    • Early Comprehension Study
    • ManyBabies
      • MB2
      • MB3
      • MB4
    • Vocabulary Spurt
    • Baby's New Words
    • ADHD Vocabulary
  • Funding
  • Lab Manager Documentation
    • Accounting
      • Expense Reports
      • Cash advances
      • Journal transfers for salaries
      • Grant Fund Codes
    • Lab manager training
      • Setting up delegate access for new lab manager
      • Internal and external mail at Concordia
      • Grant codes
      • Ordering participant t-shirts
      • Canada Summer Jobs
      • Updating lab website
      • Ordering supplies
      • Paying for various things
        • NameCheap (website domain renewal)
        • Amazon gift cards
      • Booking rooms/spaces and the Owl
      • Managing the lab server's storage
    • General Tasks
      • Updating Lab Website
      • Participant Gift Cards
      • Telephone services
    • Listservs
    • Paperwork
      • Grad student bursary
      • Postdoc paperwork
      • CARE Contracts
      • CREW-RA Contracts
    • Onboarding
  • Offboarding
    • Stuff that must be done
    • Returning Keys
  • Project workflow: Planning and setting up a Study
    • Developing a study: What to consider and basic steps
      • New study checklist
      • Participant Criteria
      • Stimulus Creation
    • Determining authorship order
    • Ethics approval process
    • Creating an OSF page for your study
      • Anonymizing your links for review
    • Creating and sharing study materials and data
    • Programming a study
      • EyeLink (Experiment Builder)
      • PsychoPy (PyHab)
    • Pre-registering a study
    • Study Binders
    • Ready to start data collection - piloting and getting the final go-ahead
    • Questionnaires
      • WebCDI
      • Language Mixing Questionnaire
      • LEQ administration
        • Downloading LEQ data from Qualtrics
      • Bilingual CDI Scoring
      • EarlyComp-CDI LAVA
      • TriCog LAVA
      • Demographics
    • HPP
      • Running an HPP study
      • HPP Troubleshooting
    • Daylong Recordings with LENA
    • EyeLink Eyetracker
      • Running an EyeLink Study
      • Programming a study in ExperimentBuilder
      • Data Viewer
      • Troubleshooting
    • Coding data in ELAN
      • ACLEW MinChat
  • Project Workflow: Collecting Data
    • General Testing Info
    • General Testing Procedure
      • Testing Procedure for Researchers
      • Testing Procedure for Helpers
    • Recruitment and scheduling
      • Database Search
      • Facebook Ads
        • Creating/Promoting Ads
        • Tips for Successful Ads
        • A/B+ Testing
      • Qualtrics and New Sign-Ups
      • Contacting Families
        • General Call Script
          • English
          • French
          • Spanish
      • Scheduling Software: Acuity
      • Group recruitment e-mails: Mailmerge
    • Participant Photos
  • Project Workflow: Data Analysis
    • Master Subject List
    • Hypothesis testing for categorical variables
    • Getting started with R
      • Data Visualization
      • Coding in the tidyverse
    • Wrangling eyetracking data
    • Pupillometry data
      • Resources and courses
    • Mixed Effects Models
      • Growth Curve Analysis
    • Anonymizing data
  • Graduate program and career building
    • Non academic career resources
      • Building a LinkedIn page
    • CV how tos
      • Academic CVs
      • Non-academic CVs
    • Establishing an Online Presence
      • Creating an Academic Website
      • Academic Twitter
      • LinkedIn
    • Applying to grad school
      • Thesis supervisors
      • Funding
    • Applying to post-graduate jobs
    • Applying to academic jobs
      • Psych jobs Wiki
    • Applying to postdoctoral positions
      • Cog Dev listserve
    • Writing a cover letter
    • Requesting a letter of recommendation/Reference letter
  • Project Workflow: Manuscript Preparation and Publication
    • Choosing a journal to publish in
    • Co-writing with Krista and othe co-authors and submitting a manuscript
    • Publishing pre-prints
    • Writing your manuscript in R Markdown
    • Writing a response to reviewers comments
Powered by GitBook
On this page
  • Installing and loading the tidyverse
  • Main tidyverse functions
  • Programming with dplyr
  • Regular expressions in the tidyverse
  • Set theory clauses
  1. Project Workflow: Data Analysis
  2. Getting started with R

Coding in the tidyverse

Best practices for human-readable code

PreviousData VisualizationNextWrangling eyetracking data

Last updated 2 years ago

The is a collection of packages designed by . There are many different ways to code in R, including base R, but this lab uses the tidyverse wherever possible for a few good reasons:

  • Tidyverse functions are pretty human-readable which makes troubleshooting a lot easier

  • Other lab members will understand your code better if everyone commits to the same coding style

  • There are a ton of resources online to help you understand tidyverse functions

  • Tidyverse functions expect tidy data, which means that they force you to use good data management practices

  • The functions in the tidyverse are very powerful and often are designed to facilitate exactly the kinds of transformations we need.

Installing and loading the tidyverse

If this is your first time using the tidyverse, you'll need to install all of the packages. Luckily they all come bundled together and can be installed with one line of code! Simply write

install.packages("tidyverse")

into the console of your Rstudio session. Be sure to watch for any additional prompts along the way as you install!

After you've installed the tidyverse, you'll need to load it. Think of installing as screwing in a lightbulb and loading as flipping the light switch. You only need to screw it in once to use the lightbulb, but you need to flip the switch every time. To load the packages, you should add

library(tidyverse)

at the beginning of each script where you use tidyverse functions.

Main tidyverse functions

The pipe: %>% this can be read as "and then" whenever you encounter it in code. data %>% Take the dataframe "data" and then group_by(id) %>% group the data by the column id and then distinct(media_name)keep only one row (per group) with each distinct media_name

allows you to remove columns in your dataframe, or move columns around into a better order

Programming with dplyr

dplyr is a gramar of data manipulation and wrangling that provides a series of consistent verbs within the tydyverse.

Helper functions that allow to more precisely select columns when using the select function:

Starts_with allows you to select all the columns of a data frame that start with a specific substring. For example "per" to select all the columns' names that start with the word "percentage".

Ends_with allows you to select all the columns of a data frame that end with a specific substring. For example "ratio" to select all the columns' names that end with the word "ratio".

Contains allows you to select all the columns of a data frame that contain a specific substring anywhere in their name. For example "199" to select all the columns' names that contain dates from the 90's.

Matches allows you to select all the columns of a data frame that match several criteria. This function works with regular expressions (see below). For example "y|perc" to select all the columns' names that either contain the expression "y" or the expression "perc".

Helper functions that allow to modify more efficiently columns when using the mutate function:

Across allows you to perform the same calculations across multiple rows when using the mutate function.

Sub allows you to replace the first occurrence of a substring with a new pattern.

Gsub allows you to replace all the occurrences of a substring with a new pattern.

Where allows you to more efficiently specify columns for a calculation or a replacement. For example across(.cols=where(is.numeric()))

Helper functions that allow a more efficient selection of columns when using the filter function:

If_any allows you to specify rows to be filtered based on the matching of specified criteria. For example : filter(if_any(.cols= starts_with("perc")))

If_all works similarly to the if_any function, but it is used when multiple rows match the specified criteria.

Helper functions that allow combining datasets:

Left_join allows you to keep all the rows from the dataset on the left plus the rows in common from the right data frame. The columns by which the dataframes are joined must have the same names in the left and right dtaaframes.

Inner_join allows you to keep only the rows that are in common between two datasets.

Anti_join allows you to identify rows that are present in one dataset, and that are not present in the second one.

Regular expressions in the tidyverse

Regular expressions are tools for describing patterns in strings. They work with the stringr library within the tidyverse.

Alternation use the token | when specifiying an "or" parameter. For example "green|blue" to specify strings that contain either the expression green or the expression blue.

Anchors use the token ^ when searching for a match at the start of the string (similar to starts_with). For example "^co". Use the token $ when searching for a match at the end of a string (similar to ends_with). For example "co$".

Set theory clauses

Set theory clauses are useful functions to join multiple datasets together when working in the tidyverse.

Intersect only keeps rows that exist in both datasets.

Union keeps all the rows from both datasets without duplicating the repeated rows.

Union_all keeps all the rows from both datasets duplicating all the repeated rows.

Setdiff keeps all the rows in the x dataset that are different from the rows in the y dataset.

allows you to remove rows that don't match some criteria you set out. Great for cleaning data.

allows you to make a new column using previous columns

allows you to transpose wide data (e.g. qualtrics output) into a longer format

is the reverse of pivot longer, it allows you to transpose long data into a wider format.

tidyverse
Hadley Wickham
Select
Filter
Mutate
Pivot longer
pivot wider