Difference between revisions of "HTAC Database - Data Download Guide"

From Pheno Wiki
Jump to: navigation, search
Line 11: Line 11:
 
At this point, you have a complete data set with subjects that have been determined to be included in the Master or Population Stratified set. They vary in how complete the data are, but they have all been determined to be usable. <br/>
 
At this point, you have a complete data set with subjects that have been determined to be included in the Master or Population Stratified set. They vary in how complete the data are, but they have all been determined to be usable. <br/>
  
'''Notes about DropDate, DQ, and SF Fields'''
+
'''Notes about DropDate, DQ, and SF Fields''' <br/>
 
Under variables listed in the Patient Registry form, these subjects may have values entered in the DropDate, DQ, SF, or Flag fields. ''These do not necessarily make the subject unusable.'' <br/>
 
Under variables listed in the Patient Registry form, these subjects may have values entered in the DropDate, DQ, SF, or Flag fields. ''These do not necessarily make the subject unusable.'' <br/>
 
* '''Positive DropDate''': Not a grounds for exclusion, but rather indicate who may have stopped, then restarted the study. Those that were really dropped for meeting exclusion criteria were marked as Inactive (Status = 1) and since these data should not be downloaded for analysis, there should be no confusion between a "real" DQ and the DropDate/DQ fields here. <br/>
 
* '''Positive DropDate''': Not a grounds for exclusion, but rather indicate who may have stopped, then restarted the study. Those that were really dropped for meeting exclusion criteria were marked as Inactive (Status = 1) and since these data should not be downloaded for analysis, there should be no confusion between a "real" DQ and the DropDate/DQ fields here. <br/>

Revision as of 12:09, 7 March 2013

Note: This guide assumes that you are familiar with the CNP dataset, names of the data subsets (e.g., LA5C), and have access to the HTAC Database.

In the HTAC Customized Data Export section, you can request data organized by Subject Type (Step 3) or Subject Status (Step 4).

  • For Step 3, you can chose to download only a certain set of patients, for example; if you want to download the entire dataset, select "ALL SUBJECTS".
  • For Step 4, you have three options:
  • "Master List (N = 1254)": This is most likely the option that all users will chose. This includes all subjects with Status = 2 (Complete).
  • "Population Stratified Set": This includes all subjects with Status = 2 (Complete), plus 62 additional subjects with Status = 0 and Genetic Recovery Case = 1. This larger dataset (N = 1316) will be used for primary genetic analyses only. We have included the additional Genetic Recovery Cases in an attempt to increase our total sample size as much as possible, but they don't necessarily meet inclusion criteria for the Master List.
  • "Inactive/Active/Complete (N = 1839)": This includes all subjects recorded in the study. This should only be downloaded for QC purposes. This dataset should not be downloaded and used for analyses.

CNP FinalSamples 030713.png

At this point, you have a complete data set with subjects that have been determined to be included in the Master or Population Stratified set. They vary in how complete the data are, but they have all been determined to be usable.

Notes about DropDate, DQ, and SF Fields
Under variables listed in the Patient Registry form, these subjects may have values entered in the DropDate, DQ, SF, or Flag fields. These do not necessarily make the subject unusable.

  • Positive DropDate: Not a grounds for exclusion, but rather indicate who may have stopped, then restarted the study. Those that were really dropped for meeting exclusion criteria were marked as Inactive (Status = 1) and since these data should not be downloaded for analysis, there should be no confusion between a "real" DQ and the DropDate/DQ fields here.
  • DQ_Reason: This variable was used throughout the study to record why a subject did not complete the study. This does not make their data unusable. Many of the remaining DQ_codes were entered at the scanning stage and are therefore scan specific (e.g., subject failed to show up for their scan, so this code was entered in the Registry in order to indicate why this portion of their data are missing). Consistent with the fact that their Status = 2 (Complete), the data are usable, despite a positive DQ code.
  • SF_Reason: This was another field used in the Registry to record information about why complete data are not available from the subject. Presence of a SF flag does not mean that the data are unusable.

Master Set (N = 1254), all patients and controls: There are 6 subjects with DropDates, 17 subjects with a DQ_Reason, and 3 subjects with a SF_Reason. Since many of these overlap, there are 20 total subjects with either of these fields filled in. Whatever data were collected from these subjects has been determined to be usable. A table listing these subjects with either DropDate, DQ or SF is here.

go back to HTAC Notes about Flags Under variables listed in the Patient Registry form, these subjects may have values entered in the Flag field. These do not necessarily make the subject unusable.
For the most part, Flags are left for investigator decision: it is up to the person conducting the analyses to decide whether or not to exclude any subject with a certain type of flag. If they do remove subjects after downloading the Master Set, it would be helpful to keep track of which subjects are excluded for communicating with other investigators and for replicating the analyses.

Master Set (N = 1254), all patients and controls: There are 70 subjects with a Flag. A table listing these subjects with a Flag is here.

go back to HTAC