HTAC Database - Data Download Guide

From Pheno Wiki
Revision as of 17:05, 6 March 2013 by Elizac (Talk | contribs)

Jump to: navigation, search

This guide assumes that you are familiar with the CNP dataset, names of the data subsets (e.g., LA5C), and have access to the HTAC Database.

In the HTAC Customized Data Export section, you can request data organized by Subject Type (Step 3) or Subject Status (Step 4).

  • For Step 3, you can chose to download only a certain set of patients, for example; if you want to download the entire dataset, select "ALL SUBJECTS".
  • For Step 4, you have 3 options:
  • "Master List (N = 1254)": This is most likely the option that all users will chose. This includes all subjects with Status = 2 (Complete).
  • "Population Stratified Set": This includes all subjects with Status = 2 (Complete), plus 62 additional subjects with Status = 0 and Genetic Recovery Case = 1. This larger dataset (N = 1316) will be used for primary genetic analyses only. We have included the additional Genetic Recovery Cases in an attempt to increase our total sample size as much as possible, but they don't necessarily meet inclusion criteria for the Master List.
  • "Inactive/Active/Complete (N = 1839)": This includes all subjects recorded in the study. This should only be downloaded for QC purposes. This dataset should not be downloaded and used for analyses.

CNP FinalSamples 030613.png

At this point, you have a complete data set with subjects that have been determined to be included in the Master or Population Stratified set. They vary in how complete the data are, but they have all been determined to be usable.
Under variables listed in the Patient Registry form, these subjects may have values entered in the DropDate, DQ, SF, or Flag fields. These do not necessarily make the subject unusable.

  • Positive DropDate: Not a grounds for exclusion, but rather indicate who may have stopped, then restarted the study. Those that were really dropped for meeting exclusion criteria were marked as Inactive (Status = 1) and since these data should not be downloaded for analysis, there should be no confusion between a "real" DQ and the DropDate/DQ fields here.