<i>Clean as a whistleR</i>

Unrelated Photo by Rstudio

whistleR is currently a combination of pre-process functions, and soon a package to streamline the cleaning of data. The main goal of these functions is to safe time. But more ambitiously, a tool to provide standard thresholds for data cleaning while maintaining complete autonomy over the parameter decisions. A future plan is to implement automated notifications on how the pre-processing changes important estimates (e.g., mean, var, reliability).

Specifications

The functions are optimized for use with cognitive behavioral tasks (e.g., Stroop, AX-CPT, Flanker)(aka long format), but future versions will include a wider variety of accepted data and parameter formats. Currently, the most common adjustable parameters are:

  • minRT = Lowest reaction time threshold
  • maxRT = Highest teaction time threshold (hard value cutoff)
  • st.d = a threshold based on number of standard deviations from the mean 1
  • maxErr or minACC = Threshold for percentage of errors made2

1 will ignore maxRT.
2 requires the data to be aggregated.

If the functions would just clean the data based on the above parameters, then it would not add a whole lot of ease-of-use. The power of the whistleR package lies in two additional features:

  1. By providing a list of variables (e.g., ID, session, trialtype, congruency, etc.), the user can specify at which level the data is cleaned.
  2. The user can easily decide whether the data that does not meet threshold values is cut or imputed. side note: mean imputation is not provided because it is ill-suited for correlational/individual differences use.

Progress

Due to the requirements of finishing a Ph.D. this project moves along, but rather slow. Then, recently Sam Parsons, a Postdoc at Oxford added similar functionality to his fantastic splithalf package. I will keep working on the functions/package and keep #rstats twitter updated!

Related