Lab Wiki
  • Welcome!
  • General Lab Info
    • The Lab
      • Who are we? (People)
      • Where are we? (Location)
      • What do we do? (Research)
      • Current Studies
        • Bilingual Families Project
        • Early Comprehension Study
        • Trilingualism Project
        • Referential Continuity Tablet
        • Language Awareness and Separation Questionnaire
      • Lab lingo (Dictionary)
      • Lab Core Values
      • Online Systems
    • Covid-19 Protocols
  • In the Lab
    • Getting started (New lab members)
      • Onboarding
      • Moving to Montreal
        • From outside of Canada
          • Visas
            • Master's / PhD level
            • Post-Doc
            • Visas for significant others
          • Health Insurance (non-Canadians)
          • Driver’s license
          • Taxes (newcomers to Canada)
        • From other provinces to Quebec
          • Health Insurance
          • Taxes
        • Once you arrive
          • Banking
          • Housing
          • Health
          • Utilities
          • Student ID card
          • Social Insurance Number (SIN)
        • More life in Montreal
          • Extending CAQ / Study Permit
          • Moving
          • Neighbourhoods
      • Lab expectations
      • Transportation
      • Academic Survival Guide
      • New Member Paperwork
      • Website Profile
      • Lab Access
      • Subscribing to ListServs
      • Lab conventions
      • Time off
      • Concordia
      • Meeting with Krista
      • New Member Orientation
    • Lab Space
    • Mentorship
      • Grad students and postdocs
      • Undergraduate project students
      • Undergraduate RAs
      • Undergraduate volunteers
    • Funding
      • Undergraduates
        • CUSRA and USRA
        • Bursaries, Scholarships and Awards
        • New FRQS Undergraduate Award
      • Doctoral
        • PBEEE (Merit Scholarship)
        • CRBLM (Graduate Stipend)
        • Tri-Council (NSERC, SSHRC, CIHR)
        • FRQSC
        • Mitacs accelerate
        • MacKenzie King Scholarship
      • Postdoctoral
        • PBEEE (Merit Scholarship)
        • FRQSC
        • Tri-Council (NSERC, SSHRC, CIHR)
      • TAship/Course Instructor
        • TA
        • Course Instructor
      • Expense Reimbursement
    • Conferences
      • Conferences to consider
    • File Naming Protocol
  • Testing Plans
    • Bilingual Family Project
    • Early Comprehension Study
    • ManyBabies
      • MB2
      • MB3
      • MB4
    • Vocabulary Spurt
    • Baby's New Words
    • ADHD Vocabulary
  • Funding
  • Lab Manager Documentation
    • Accounting
      • Expense Reports
      • Cash advances
      • Journal transfers for salaries
      • Grant Fund Codes
    • Lab manager training
      • Setting up delegate access for new lab manager
      • Internal and external mail at Concordia
      • Grant codes
      • Ordering participant t-shirts
      • Canada Summer Jobs
      • Updating lab website
      • Ordering supplies
      • Paying for various things
        • NameCheap (website domain renewal)
        • Amazon gift cards
      • Booking rooms/spaces and the Owl
      • Managing the lab server's storage
    • General Tasks
      • Updating Lab Website
      • Participant Gift Cards
      • Telephone services
    • Listservs
    • Paperwork
      • Grad student bursary
      • Postdoc paperwork
      • CARE Contracts
      • CREW-RA Contracts
    • Onboarding
  • Offboarding
    • Stuff that must be done
    • Returning Keys
  • Project workflow: Planning and setting up a Study
    • Developing a study: What to consider and basic steps
      • New study checklist
      • Participant Criteria
      • Stimulus Creation
    • Determining authorship order
    • Ethics approval process
    • Creating an OSF page for your study
      • Anonymizing your links for review
    • Creating and sharing study materials and data
    • Programming a study
      • EyeLink (Experiment Builder)
      • PsychoPy (PyHab)
    • Pre-registering a study
    • Study Binders
    • Ready to start data collection - piloting and getting the final go-ahead
    • Questionnaires
      • WebCDI
      • Language Mixing Questionnaire
      • LEQ administration
        • Downloading LEQ data from Qualtrics
      • Bilingual CDI Scoring
      • EarlyComp-CDI LAVA
      • TriCog LAVA
      • Demographics
    • HPP
      • Running an HPP study
      • HPP Troubleshooting
    • Daylong Recordings with LENA
    • EyeLink Eyetracker
      • Running an EyeLink Study
      • Programming a study in ExperimentBuilder
      • Data Viewer
      • Troubleshooting
    • Coding data in ELAN
      • ACLEW MinChat
  • Project Workflow: Collecting Data
    • General Testing Info
    • General Testing Procedure
      • Testing Procedure for Researchers
      • Testing Procedure for Helpers
    • Recruitment and scheduling
      • Database Search
      • Facebook Ads
        • Creating/Promoting Ads
        • Tips for Successful Ads
        • A/B+ Testing
      • Qualtrics and New Sign-Ups
      • Contacting Families
        • General Call Script
          • English
          • French
          • Spanish
      • Scheduling Software: Acuity
      • Group recruitment e-mails: Mailmerge
    • Participant Photos
  • Project Workflow: Data Analysis
    • Master Subject List
    • Hypothesis testing for categorical variables
    • Getting started with R
      • Data Visualization
      • Coding in the tidyverse
    • Wrangling eyetracking data
    • Pupillometry data
      • Resources and courses
    • Mixed Effects Models
      • Growth Curve Analysis
    • Anonymizing data
  • Graduate program and career building
    • Non academic career resources
      • Building a LinkedIn page
    • CV how tos
      • Academic CVs
      • Non-academic CVs
    • Establishing an Online Presence
      • Creating an Academic Website
      • Academic Twitter
      • LinkedIn
    • Applying to grad school
      • Thesis supervisors
      • Funding
    • Applying to post-graduate jobs
    • Applying to academic jobs
      • Psych jobs Wiki
    • Applying to postdoctoral positions
      • Cog Dev listserve
    • Writing a cover letter
    • Requesting a letter of recommendation/Reference letter
  • Project Workflow: Manuscript Preparation and Publication
    • Choosing a journal to publish in
    • Co-writing with Krista and othe co-authors and submitting a manuscript
    • Publishing pre-prints
    • Writing your manuscript in R Markdown
    • Writing a response to reviewers comments
Powered by GitBook
On this page
  • Confidence intervals
  • Hypothesis testing for a proportion
  • Creating contingency tables
  • Chisquare as a goodness of fit measure
  1. Project Workflow: Data Analysis

Hypothesis testing for categorical variables

PreviousMaster Subject ListNextGetting started with R

Last updated 2 years ago

There are several methods to test hypotheses using categorical data. The most useful methods are to calculate confidence intervals for proportions or for difference in proportions and to use hypothesis tests for proportions, difference in proportions, independence between variables and goodness of fit. Below we will briefly explore these methods.

Confidence intervals

While there are many summary and descriptive statistics that can be calculated for categorical data, there is no straightforward way to calculate confidence intervals. Below we explore several ways to approximate confidence intervals.

Bootstrapping is a technique that samples with replacement from the data and performs the same calculation through several iterations. For example, if a proportion is calculated 100 times from different samplings of the data provided, we can approximate that the true proportion of the category in the data is contained within the lower and upper limit of the values calculated in those 100 iterations.

Helper functions from the infer library that allow you to calculate confidence intervals through bootstrapping:

install.packages ="infer"

Specify allows you to specify the variable of interest in the data and the value for which to calculate the summary statistic. For example the value "female" from the variable "gender".

Generate allows you to perform the bootstrapping operation, and specify the number of iterations for the operation.

Calculate allows you to specify the summary statistic to be calculated for the variable of interest throughout the bootstrapping operation.

Example:

CI <- df%>% specify(response = gender, success="female")%>%

generate(reps=100, type="bootstrap)%>%

calculate(stat="prop")

Phat standard distribution it is possible to calculate the standard error (SE) from the phat standard distribution. Once the standard error is known, a confidence interval can be calculated. The SE of phat is calculated as sqrt(phat*(1-phat)/n). Note that for this method to be used it is necessary that the observations are independent from each other and that the n is large enough (at least n>10).

Here is an example taken from

Hypothesis testing for a proportion

As reviewed in the section above, the r package infer allows calculating confidence intervals through bootstrapping operations using the helper functions specify, generate and calculate. To test a hypothesis we add the helper function:

Example:

Object_name <- df%>% specify(response = gender, success="female")%>%

hypothesize(null="point", p=0.5),

generate(reps=100, type="simulate")%>%

calculate(stat="prop")

This method allows you to calculate the proportion value you would get under the null hypothesis. To calculate a p-value from there you would need to calculate the amount of phats that were more extreme than the observed phat under the null hypothesis. For example:

null %>% summarize(mean(stat>phat)%>%

pull()*2

Note that we multiply the pull value times 2 to account for the left tail.

Creating contingency tables

You might want to calculate the association between categorical variables under the null hypothesis that the variables of interest are independent from each other. One way to test this hypothesis is to create contingency tables and to use the chisquare statistic to reject the null.

To create tables you will need to use the "broom" r package: install.packages("broom").

Some helper functions when using the broom package to do contingency tables are:

Table allows you to turn a selection of columns into a table object.

Tidy allows you to automatically organize a table object in a more efficient way.

Uncount(n) allows you to get a row per observation instead of a summary of the counts.

Once the variables of interest are selected and saved into a tidy table object, we can test the independence of the variables. To test for indenpendence we go back to our helper functions specify, hypothesize, generate and calculate. This time we generate repetitions by permutation.

Example:

null<- data %>% specify (var1~var2) %>%

hypothesize (null="independence")%>%

generate(reps=100, type="permute")%>%

calculate(stat="Chisq")

This allows us to generate expected counts under the null hypothesis of independence, and thus to compare them to our observed counts.

Chisquare as a goodness of fit measure

A goodness of fit test tries to establish weather the observed values match expected values under a specific model. For example, say we expect the distribution of family language strategy use to be equal in a group of randomly sampled parents. If we have three possible family language strategies we expect this model to be true: model<- c(opol=1/3, 2_bilingual=1/3, 1_bilingual=1/3).

Once we have a model of how we expect the data to behave, we can use the chi-squared test as a goodness-of-fit measure, by comparing our model to our observed data a such:

chisq.test(observed_table, p=model)$stat

Hypothesize allows you to declare a null hypothesis. For more documentation go .

here
this page.