Difference between revisions of "HTAC Database - Cleaned Data: Cleaning Rules"

From Pheno Wiki
Jump to: navigation, search
Line 5: Line 5:
 
* The Cleaned Data will include all summary scores for subjects with usable task data. Although you may conduct additional cleaning of the data (e.g., exclude outliers after examining the distribution of a given variable), these data do not include subjects that failed certain criteria, as outlined under each task description, here [[HTAC]]. If you want to download data that is ready for analysis, you would select this option -- although you should still go through and check the data. <br/>
 
* The Cleaned Data will include all summary scores for subjects with usable task data. Although you may conduct additional cleaning of the data (e.g., exclude outliers after examining the distribution of a given variable), these data do not include subjects that failed certain criteria, as outlined under each task description, here [[HTAC]]. If you want to download data that is ready for analysis, you would select this option -- although you should still go through and check the data. <br/>
  
'''ePrime: TS''' [see [[CNP_TS]]]
+
Details about the cleaning rules, the variables used, and the task are provided under a description of each task. Below, I have listed each cleaning rule applied to the LA2K data for the Cleaned Data option. Listed under each rule are the subject IDs that are excluded based on that cleaning rule, as well the final N of usable data for each task, after cleaning. Note this was conducted using the N = 1316 Population Stratification dataset, so numbers reflect this dataset. <br/>
 +
 
 +
'''ePrime: TS''' [see [[CNP_TS]]] <br/>
 +
* If TRIALCOUNT does not equal 192, exclude.  <br/>
 +
** None <br/>
 +
* If TS_ACCURACY is less than or equal to 0.50, exclude.  <br/>
 +
** 50004 <br/>
 +
1267 with usable TS summary data.  <br/>

Revision as of 11:31, 11 June 2013

The following applies primarily to the E-Prime summary task data (not trial-by-trial data).

In The HTAC Customized Data Export section, you can request either Uncleaned or Cleaned data.

  • The Uncleaned Data will include all summary scores that have been created by the scoring of the raw trial-by-trial data, but which have not been cleaned to exclude subjects that have invalid or incomplete data. If you don't agree with the cleaning rules outlined below, or want to test out new cleaning rules, you would select this option.
  • The Cleaned Data will include all summary scores for subjects with usable task data. Although you may conduct additional cleaning of the data (e.g., exclude outliers after examining the distribution of a given variable), these data do not include subjects that failed certain criteria, as outlined under each task description, here HTAC. If you want to download data that is ready for analysis, you would select this option -- although you should still go through and check the data.

Details about the cleaning rules, the variables used, and the task are provided under a description of each task. Below, I have listed each cleaning rule applied to the LA2K data for the Cleaned Data option. Listed under each rule are the subject IDs that are excluded based on that cleaning rule, as well the final N of usable data for each task, after cleaning. Note this was conducted using the N = 1316 Population Stratification dataset, so numbers reflect this dataset.

ePrime: TS [see CNP_TS]

  • If TRIALCOUNT does not equal 192, exclude.
    • None
  • If TS_ACCURACY is less than or equal to 0.50, exclude.
    • 50004

1267 with usable TS summary data.