Difference between revisions of "HTAC Database - Cleaned Data: Cleaning Rules"

From Pheno Wiki
Jump to: navigation, search
Line 3: Line 3:
 
In The HTAC Customized Data Export section, you can request either '''Uncleaned''' or '''Cleaned''' data. <br/>
 
In The HTAC Customized Data Export section, you can request either '''Uncleaned''' or '''Cleaned''' data. <br/>
 
* The Uncleaned Data will include all summary scores that have been created by the scoring of the raw trial-by-trial data, but which have not been cleaned to exclude subjects that have invalid or incomplete data. If you don't agree with the cleaning rules outlined below, or want to test out new cleaning rules, you would select this option. <br/>
 
* The Uncleaned Data will include all summary scores that have been created by the scoring of the raw trial-by-trial data, but which have not been cleaned to exclude subjects that have invalid or incomplete data. If you don't agree with the cleaning rules outlined below, or want to test out new cleaning rules, you would select this option. <br/>
* The Cleaned Data will include all summary scores for subjects with usable task data. Although you may conduct additional cleaning of the data (e.g., exclude outliers after examining the distribution of a given variable), these data do not include subjects that failed certain criteria, as outlined under each task description, here [[HTAC]]. If you want to download data that is ready for analysis, you would select this option -- although you should still go through and check the data. <br/>
+
* The Cleaned Data will include all summary scores for subjects with usable task data. Although you may conduct additional cleaning of the data (e.g., exclude outliers after examining the distribution of a given variable), these data do not include subjects that failed certain criteria, as outlined under each task description, here [[HTAC]]. Subjects that have been excluded based on cleaning rules will have empty entries for only that task; they will look the same as those subjects that did not complete the task. If you want to download data that is ready for analysis, you would select this option -- although you should still go through and check the data. <br/>
  
 
Details about the cleaning rules, the variables used, and the task are provided under a description of each task. Below, I have listed each cleaning rule applied to the LA2K data for the Cleaned Data option. Listed under each rule are the subject IDs that are excluded based on that cleaning rule, as well the final N of usable data for each task, after cleaning. Note this was conducted using the N = 1316 Population Stratification dataset, so numbers reflect this dataset. <br/>
 
Details about the cleaning rules, the variables used, and the task are provided under a description of each task. Below, I have listed each cleaning rule applied to the LA2K data for the Cleaned Data option. Listed under each rule are the subject IDs that are excluded based on that cleaning rule, as well the final N of usable data for each task, after cleaning. Note this was conducted using the N = 1316 Population Stratification dataset, so numbers reflect this dataset. <br/>
Line 13: Line 13:
 
** 50004 <br/>
 
** 50004 <br/>
 
1267 with usable TS summary data.  <br/>
 
1267 with usable TS summary data.  <br/>
 +
 +
'''ePrime: SCAP''' [see [[CNP_SCAP]]] <br/>
 +
* If SCAP_TRIAL_COUNT does not equal 48, exclude. <br/>
 +
** 10476 <br/>
 +
* If SCAP_AVERAGE_CORR is less than or equal to 0.50, exclude. <br/>
 +
** 11419, 50004, 50063 <br/>
 +
* If SCAP1_CORRECTRT_MEAN, SCAP3_CORRECTRT_MEAN, SCAP5_CORRECTRT_MEAN, or SCAP7_CORRECTRT_MEAN is greater than 6000, exclude. <br/>
 +
** 10332, 11419, 50075 <br/>
 +
1263 with usable SCAP summary data. <br/>
 +
 +
'''ePrime: VCAP''' [see [[CNP_VCAP]]] <br/>
 +
* If VCAP_TRIAL_COUNT does not equal 48, exclude. <br/>
 +
** None <br/>
 +
* If VCAP_AVERAGE_CORR is less than or equal to 0.50, exclude. <br/>
 +
** 10251, 10894, 50004, 50044, 50076, 70037 <br/>
 +
* If VCAP3_CORRECTRT_MEAN, VCAP5_CORRECTRT_MEAN, VCAP7_CORRECTRT_MEAN, or VCAP9_CORRECTRT_MEAN is greater than 6000, exclude. <br/>
 +
** None <br/>
 +
1266 with usable SCAP summary data. <br/>

Revision as of 11:39, 11 June 2013

The following applies primarily to the E-Prime summary task data (not trial-by-trial data).

In The HTAC Customized Data Export section, you can request either Uncleaned or Cleaned data.

  • The Uncleaned Data will include all summary scores that have been created by the scoring of the raw trial-by-trial data, but which have not been cleaned to exclude subjects that have invalid or incomplete data. If you don't agree with the cleaning rules outlined below, or want to test out new cleaning rules, you would select this option.
  • The Cleaned Data will include all summary scores for subjects with usable task data. Although you may conduct additional cleaning of the data (e.g., exclude outliers after examining the distribution of a given variable), these data do not include subjects that failed certain criteria, as outlined under each task description, here HTAC. Subjects that have been excluded based on cleaning rules will have empty entries for only that task; they will look the same as those subjects that did not complete the task. If you want to download data that is ready for analysis, you would select this option -- although you should still go through and check the data.

Details about the cleaning rules, the variables used, and the task are provided under a description of each task. Below, I have listed each cleaning rule applied to the LA2K data for the Cleaned Data option. Listed under each rule are the subject IDs that are excluded based on that cleaning rule, as well the final N of usable data for each task, after cleaning. Note this was conducted using the N = 1316 Population Stratification dataset, so numbers reflect this dataset.

ePrime: TS [see CNP_TS]

  • If TRIALCOUNT does not equal 192, exclude.
    • None
  • If TS_ACCURACY is less than or equal to 0.50, exclude.
    • 50004

1267 with usable TS summary data.

ePrime: SCAP [see CNP_SCAP]

  • If SCAP_TRIAL_COUNT does not equal 48, exclude.
    • 10476
  • If SCAP_AVERAGE_CORR is less than or equal to 0.50, exclude.
    • 11419, 50004, 50063
  • If SCAP1_CORRECTRT_MEAN, SCAP3_CORRECTRT_MEAN, SCAP5_CORRECTRT_MEAN, or SCAP7_CORRECTRT_MEAN is greater than 6000, exclude.
    • 10332, 11419, 50075

1263 with usable SCAP summary data.

ePrime: VCAP [see CNP_VCAP]

  • If VCAP_TRIAL_COUNT does not equal 48, exclude.
    • None
  • If VCAP_AVERAGE_CORR is less than or equal to 0.50, exclude.
    • 10251, 10894, 50004, 50044, 50076, 70037
  • If VCAP3_CORRECTRT_MEAN, VCAP5_CORRECTRT_MEAN, VCAP7_CORRECTRT_MEAN, or VCAP9_CORRECTRT_MEAN is greater than 6000, exclude.
    • None

1266 with usable SCAP summary data.