HTAC Database - Cleaned Data: Cleaning Rules

From Pheno Wiki
Revision as of 12:13, 11 June 2013 by Elizac (Talk | contribs)

Jump to: navigation, search

The following applies primarily to the E-Prime summary task data (not trial-by-trial data).

In The HTAC Customized Data Export section, you can request either Cleaned or Uncleaned data.

  • The Cleaned Data will include all summary scores for subjects with usable task data. Although you may conduct additional cleaning of the data (e.g., exclude outliers after examining the distribution of a given variable), these data do not include subjects that failed certain criteria, as outlined under each task description [see here: HTAC]. Subjects that have been excluded based on cleaning rules will have empty entries for only that task; they will look the same as those subjects that did not complete the task. If you want to download data that is ready for analysis, you would select this option -- although you should still go through and check the data.
  • The Uncleaned Data will include all summary scores that have been created by the scoring of the raw trial-by-trial data, but which have not been cleaned to exclude subjects that have invalid or incomplete data. If you don't agree with the cleaning rules outlined below, or want to test out new cleaning rules, you would select this option.


Details about the cleaning rules, the variables used, and the task are provided under a description of each task. Below, I have listed each cleaning rule applied to the LA2K data for the Cleaned Data option. Listed under each rule are the subject IDs that are excluded based on that cleaning rule, as well the final N of usable data for each task, after cleaning. Note this was conducted using the N = 1316 Population Stratification dataset, so numbers reflect this dataset.


ePrime: TS [see CNP_TS]

  • If TRIALCOUNT does not equal 192, exclude.
    • None
  • If TS_ACCURACY is less than or equal to 0.50, exclude.
    • 50004

1267 with usable TS summary data.

ePrime: SCAP [see CNP_SCAP]

  • If SCAP_TRIAL_COUNT does not equal 48, exclude.
    • 10476
  • If SCAP_AVERAGE_CORR is less than or equal to 0.50, exclude.
    • 11419, 50004, 50063
  • If SCAP1_CORRECTRT_MEAN, SCAP3_CORRECTRT_MEAN, SCAP5_CORRECTRT_MEAN, or SCAP7_CORRECTRT_MEAN is greater than 6000, exclude.
    • 10332, 11419, 50075

1263 with usable SCAP summary data.

ePrime: VCAP [see CNP_VCAP]

  • If VCAP_TRIAL_COUNT does not equal 48, exclude.
    • None
  • If VCAP_AVERAGE_CORR is less than or equal to 0.50, exclude.
    • 10251, 10894, 50004, 50044, 50076, 70037
  • If VCAP3_CORRECTRT_MEAN, VCAP5_CORRECTRT_MEAN, VCAP7_CORRECTRT_MEAN, or VCAP9_CORRECTRT_MEAN is greater than 6000, exclude.
    • None

1266 with usable SCAP summary data.

ePrime: DDT [see CNP_DDT]

  • If DDT_SMALL_INCON is "Y", exclude.
    • 10500, 10667, 11345, 11406
  • If DDT_MEDIUM_INCON is "Y", exclude.
    • 10311, 10580, 10756, 10832, 10889, 10932, 11406, 11512, 50004, 50007
  • If DDT_LARGE_INCON is "Y", exclude.
    • 10019, 10286, 10792, 11345, 50004, 50007, 60006

1253 with usable DDT summary data.

ePrime: BART [see CNP_BART]

  • If BART_TRIALCOMP is less than 40, exclude.
    • None
  • If BART_CASHOUTWOPUMP is greater than or equal to 3, exclude.
    • 10194, 10228, 10245, 10390, 10432, 10579, 10679, 10702, 10723, 10813, 10843, 10942, 10963, 10998, 11109, 11176, 11257, 11334, 11401, 50004, 50044, 50047, 50057, 60057, 60077, 60079, 70010, 70086
  • If BART_REDEXPLOSIONS is greater than or equal to 19, exclude.

10051, 10575, 10993, 11472, 11528
1238 with usable BART summary data.

ePrime: ANT [see CNP_ANT]

  • None of the rules listed under CNP_ANT resulted in the exclusion of a subject, so nothing to apply to the data.

1268 with usable ANT summary data.

ePrime: CPT [see CNP_CPT]

  • If CPT_HITS is less than 162, exclude.
    • 10416

1269 with usable CPT summary data.

ePrime: SCWT [see CNP_SCWT]

  • If SCWT_ACCCON is less than or equal to 0.50, exclude.
    • 10001, 10884, 10985, 11365, 11499, 50004
  • If SCWT_ACCINC is less than or equal to 0.50, exclude.
    • 10001, 10112, 10843, 10884, 10985, 11123, 11224, 11301, 11365, 50003, 50004, 50008

1258 with usable SCWT summary data.

ePrime: SST [see CNP_Stop_Signal]

  • If SST_BK1_ENDTRIAL or SST_BK2_ENDTRIAL is less than 128, exclude.
    • None
  • If SST_SES_SSRT_QUANT is missing (empty field), exclude.
    • 10327, 10579, 10855, 10861, 10932, 10986, 11035, 11198
  • If SST_SES_PERCENT_INHIB is less than 0.25 or greater than 0.75, or missing, exclude.
    • 10016, 10973, 11022, 11072, 11110, 11243, 11321, 11489, 11503, 50076

1251 with usable SST summary data.

ePrime: SR [see CNP_SR]

  • If SR_ACC_ENC is less than or equal to 0.50, exclude.
    • 50004
  • If SR_ACC_REC is less than or equal to 0.50, exclude.
    • 10457, 10928, 10938, 10974, 11042, 11122, 11145, 11290, 11402, 50085

1257 with usable SR summary data.

ePrime: RK [see CNP_RK]

  • If RK_RRESPONSE and RK_KRESPONSE both equal 0, exclude.
    • 112 subjects
      • Note about this one: the initial data collected didn't actually collect responses, so 0 scores for these 112 subjects reflect incomplete data collection and need to be excluded.
  • If RK_NSI does not equal 60, exclude.
    • 101 subjects (9 overlap with the above criterion, so 92 additional subjects excluded based on the RK_NSI rule).

1058 with usable RK summary data.

ePrime: SMNM [see CNP_SMNM]

  • If SMNM_TRIALCOUNT is less than 40, exclude.
    • 10489, 70003
  • If SMNM_MANIP_MN is less than 0.50, exclude.
    • 10014, 10017, 10019, 10044, 10083, 10245, 10311, 10334, 10348, 10382, 10396, 10421, 10462, 10479, 10503, 10683, 10724, 10731, 10769, 10887, 10903, 10986, 11064, 11172, 11243, 11289, 11329, 11344, 11348, 11404, 11407, 11414, 11444, 50033, 50052, 50065, 50069, 60010, 60019, 60022, 60084, 70001, 70014
  • If SMNM_MAIN_MN is less than 0.50, exclude.
    • 10019, 10311, 10579, 10724, 10861, 11024, 11243, 50014, 50029, 50033, 50044, 70013
  • If SMNM_MANIP_TT is less than 10, or missing, exclude.
    • 11024, 11296, 11365, 50075
  • If SMNM_MAIN_TT is less than 10, or missing, exclude.
    • 11024, 11365, 50029, 50075

1213 with usable SMNM summary data.

ePrime: VMNM [see CNP_VMNM]

  • If VMNM_TRIALCOUNT is less than 40, exclude.
    • 10396, 10537, 60006
  • If VMNM_MANIP_MN is less than 0.50, exclude.
    • 10026, 10032, 10038, 10079, 10222, 10416, 10544, 10687, 10699, 10792, 10799, 10848, 10898, 10956, 11050, 11054, 11111, 11299, 11384, 11399, 11414, 11426, 11468, 11540, 50004, 50006, 50022, 50042, 50057, 50077, 60045, 60052, 60068, 60078, 60080, 70020, 70026
  • If VMNM_MAIN_MN is less than 0.50, exclude.
    • 10014, 10026, 10079, 10251, 10331, 10479, 10687, 11111, 11289, 50004, 50016, 50038, 50044, 50057, 70001
  • If VMNM_MANIP_TT is less than 10, or missing, exclude.
    • 11003, 11306, 11316, 11551, 50029
  • If VMNM_MAIN_TT is less than 10, or missing, exclude.
    • 50016, 50029, 50032

1211 with usable VMNM summary data.

ePrime: RL [see CNP_RL]

  • No rules to apply.

1265 with usable RL summary data.

ePrime: DRLT [see CNP_DRL]

  • If DRLT_EXPERIMENTNAME = "DRLT" or "DRLT_SP_DEVI" or "DRLT_SP_SHIVA", exclude.
    • 53 subjects
  • If DRLT_POST_TRIAL_TOTAL equals 0, exclude.
    • 11019

517 with usable DRLT summary data.


go back to HTAC