Notes on Downloading Imaging Workflow Data

From Pheno Wiki
Jump to: navigation, search

An extensive amount of QC went into the imaging data and we attempted to document this as much as possible both in the HTAC database and here on the wiki. However, there are multiple considerations to take into account when determining whether a subject's data is usable. This makes it difficult to download data directly from the HTAC database to query the imaging data.

We have prepared lists of usable subjects, for each type of scan, following QC (as documented in this wiki). Rather than having users attempt to decipher the Imaging Workflow forms in the HTAC database codebook (which we used primarily for logging), we propose the following system:

1. User submits query, stating which task, level of analysis, associated data, and group are requested (e.g., Stop-signal first-level models for all Controls).

  • Task: see LA5C page on wiki for full list of tasks
  • Level of analysis: Raw data, or completed first-level models (which have undergone complete QC)
  • Associated data: behavioral data corresponding to the task of interest; mprage; mbw
  • Group: Controls, specific Patient group, or All

2. Once the query is approved, the requested data will be copied into an Approved Analysis directory.
Approved analysis directories are located at space/raid2/data/poldrack/CNP/approved_analyses

3. We will provide the user with a file which includes the following information, for each task/scan requested:

  • PTID (LA2K ID; primary ID)
  • FUNC_ID (in most cases, agrees with LA2K ID; in handful of cases, represents their original ID that scan data were collected under. see LA3C ID Switches page)
  • Completed Status (Primary LA2K Status variable)
  • 5C_Status (Primary LA5C Status variable)
  • Scanner (1 = BMC; 2 = CCN)
  • BEHAV_NOTE (overall QC note)
  • A set of FLAGS which indicate whether the subject should be distributed, or whether the subject can be shared but may be flagged for moderate motion, suspicious performance, etc.
  • NOTE: Ensuring that the same set of subjects are distributed for a given task (following complete QC) will ensure that absolutely unusable subjects are not distributed, so the maximum number of potentially usable subjects is consistent across analyses (or at least queries). We want to ensure that subjects with excessive motion, incomplete data, or otherwise unusable data are not analyzed. However, what is reflected in the Notes field for each task/scan are things that were flagged during QC, but which are really up to the user (e.g., moderate motion, suspicious performance). These things may be more important for some types of analyses than others OR these flags might help to explain some odd results (which we weren't able to detect initially). As a result of this, there is the potential for slight variability in the final Ns across analyses given specific methods and goals, but this system at least ensures that as many possible subjects with potentially usable data are made available for analysis.

If you chose to select variables directly from the Imaging Workflow form for download, we suggest the following:

  • Completed
  • ImgB_5CStatus or ImgA_5CStatus (depending on whether task of interest is in A or B scan)
  • Overall: Scanner
  • Overall: Note
  • Flag for Elimination (all fields) for your task of interest (includes Flag_Share and Flag_S (notes))

After downloading:

  • Filter on the Completed variable:
    • Remove all subjects with Completed Status OTHER than 1
  • Filter on 5C_Status:
    • Remove all subjects with 5C_Status OTHER than 2
    • Remove all subjects with Flag_Share (for your task) = 0

Note that the date that is downloaded in the Imaging Workflow form is NOT the scan date, but is instead the date that the form was started as part of our QC process. If you want the scan date, this information is stored in MRI Scanning Notes and needs to be downloaded separately.


Link back to LA5C page.