<i>Clean as a whistleR</i>
 Unrelated Photo by Rstudio
    Unrelated Photo by Rstudio
  whistleR is currently a combination of pre-process functions, and soon a package to streamline the cleaning of data. The main goal of these functions is to safe time. But more ambitiously, a tool to provide standard thresholds for data cleaning while maintaining complete autonomy over the parameter decisions. A future plan is to implement automated notifications on how the pre-processing changes important estimates (e.g., mean, var, reliability).
Specifications
The functions are optimized for use with cognitive behavioral tasks (e.g., Stroop, AX-CPT, Flanker)(aka long format), but future versions will include a wider variety of accepted data and parameter formats. Currently, the most common adjustable parameters are:
- minRT= Lowest reaction time threshold
- maxRT= Highest teaction time threshold (hard value cutoff)
- st.d= a threshold based on number of standard deviations from the mean 1
- maxError- minACC= Threshold for percentage of errors made2
1 will ignore maxRT.
2 requires the data to be aggregated.
If the functions would just clean the data based on the above parameters, then it would not add a whole lot of ease-of-use. The power of the whistleR package lies in two additional features:
- By providing a list of variables (e.g., ID, session, trialtype, congruency, etc.), the user can specify at which level the data is cleaned.
- The user can easily decide whether the data that does not meet threshold values is cut or imputed. side note: mean imputation is not provided because it is ill-suited for correlational/individual differences use.
Progress
Due to the requirements of finishing a Ph.D. this project moves along, but rather slow. Then, recently Sam Parsons, a Postdoc at Oxford added similar functionality to his fantastic splithalf package. I will keep working on the functions/package and keep #rstats twitter updated!