<i>Clean as a whistleR</i>
whistleR
is currently a combination of pre-process functions, and soon a package to streamline the cleaning of data. The main goal of these functions is to safe time. But more ambitiously, a tool to provide standard thresholds for data cleaning while maintaining complete autonomy over the parameter decisions. A future plan is to implement automated notifications on how the pre-processing changes important estimates (e.g., mean, var, reliability).
Specifications
The functions are optimized for use with cognitive behavioral tasks (e.g., Stroop, AX-CPT, Flanker)(aka long format), but future versions will include a wider variety of accepted data and parameter formats. Currently, the most common adjustable parameters are:
minRT
= Lowest reaction time thresholdmaxRT
= Highest teaction time threshold (hard value cutoff)st.d
= a threshold based on number of standard deviations from the mean 1maxErr
orminACC
= Threshold for percentage of errors made2
1 will ignore maxRT
.
2 requires the data to be aggregated.
If the functions would just clean the data based on the above parameters, then it would not add a whole lot of ease-of-use. The power of the whistleR
package lies in two additional features:
- By providing a list of variables (e.g., ID, session, trialtype, congruency, etc.), the user can specify at which level the data is cleaned.
- The user can easily decide whether the data that does not meet threshold values is cut or imputed. side note: mean imputation is not provided because it is ill-suited for correlational/individual differences use.
Progress
Due to the requirements of finishing a Ph.D. this project moves along, but rather slow. Then, recently Sam Parsons, a Postdoc at Oxford added similar functionality to his fantastic splithalf
package. I will keep working on the functions/package and keep #rstats twitter updated!