Difference between revisions of "CNP RL"

Revision as of 00:01, 6 June 2011

go back to HTAC

The CNP "RL" task contains two tasks embedded in one - a probabilistic selection task (PST) and a probabilistic reversal learning task (PRLT). These tasks both involve reinforcement learning and were designed to assess feedback sensitivity and behavioral flexibility, respectively. Both have been extensively used to determine reinforcement learning biases and behavioral flexibility in both healthy and patient populations. Initially designed by Michael Frank, the probabilistic selection task is specifically used to determine participants' tendencies to learn either from positive or negative feedback (e.g., Frank et al., 2004). The probabilistic reversal learning task, originally developed by Trevor Robbins and Robert Rogers (Lawrence et al, 1999; Swainson et al., 2000), examines participants' ability to adapt to changes in learned contingencies. Both tasks involve initial training periods in which participants must learn appropriate responses given probabilistic feedback ("noisy" feedback).

Probabilistic Selection Task

In the typical implementation of the PST (Frank et al., 2004), three pairs of cards are presented and participants must learn the "correct" card in each pair. Each pair is associated with different probabilities. For pair AB, choosing A is associated with positive feedback 80% of the time (B 20% of the time). For CD, choosing C leads to positive feedback 70% of the time, and E in EF 60% of the time. Over time, participants learn to choose the higher probability cards - choosing A, C, and E most of the time. Learning may be achieved either by choosing the card associated with positive feedback or by avoiding the card associated with negative feedback. After training, to assess whether participants learn more from positive or negative feedback, the cards are recombined in a "probe" phase, such that each card is paired with every other card. Participants are required to make a choice given these novel pairs without receiving feedback. A bias towards learning from positive feedback is determined by the number of times participants choose the highest probability card (the card receiving the most positive feedback) relative to the others (A vs. B, A vs. C, A vs. D, A vs. E), and a bias towards learning from negative feedback is derived by the number times that the lowest probability card is avoided (B vs. C, B vs. E, B vs. F). The tendency to choose A versus avoiding B is associated with several neuropsychiatric phenotypes, most notably observed in Parkinson's patients (Frank et al., 2004). When off dopamine agonist medications, PD patients are more likely to learn by avoiding negative feedback, but rely on positive feedback more when on medications. Evidence for genetic associations with this feedback sensitivity bias has also been found (Frank et al., 2007).

Probabilistic Reversal Learning Task

The PRLT, like other reversal learning tasks, includes initial learning stages (acquisition) followed by a reversal stage in which stimulus-response contingencies change and participants must re-learn new associations. For example, in a concurrent discrimination task in which participants must choose between A or B and A is initially reinforced during acquisition, participants must learn that A is no longer reinforced during the reversal stage and choose B. Although there are several variants of the PRLT, the most commonly used involves choosing the correct image in a pair of simultaneously presented images given probabilistic feedback (Swainson et al., 2000). Typically, one of the images is correct 80% of the time. After learning criterion has been reached (e.g., 8 consecutively correct responses), the probabilities of receiving positive feedback reverse such that the image that was previously correct 80% of the time is now only correct 20% of the time. The errors made after the reversal are often termed "perseverative errors". These errors are considered a measure of participants' ability to adapt to contingency changes, and they are the most commonly used indices of reversal learning performance across species. Another common measure is failure or success in reaching learning criteria during acquisition and reversal stages.

Task Procedure

For general testing procedure, please refer to LA2K General Testing Procedure [here?].

In the LA2K "RL" task, participants perform two training sessions with the PST and PRLT following each (see Figure 1).

Training 1 -> PST Probe -> Training 2 -> PRLT

Figure 1. Stages of the CNP "RL" Experiment

Training 1: In the first training session, participants perform a probabilistic object discrimination task in which they must select one of two simultaneously presented images (abstract visual patterns presented to the left and right of each other). One image is more likely to be "correct" than the other. Four pairs of images are presented with the following respective feedback probabilities: 100/0, 80/20, 70/30, and 60/40. For example, in the 80/20 pair, one card is correct 80% of the time. Subjects are trained to criteria (70%, 65%, 60%, and 55% correct, respectively for each card pair). Accuracy is calculated based on cumulative performance. At the start of the training session, the following instructions appear on the screen and are read to the participant by the experimenter:

Training 1 Instructions: "Learning Session: In this test you will be shown sets of two images at a time. Try to choose the one that is correct. You will learn if it was correct or not after you make your choice. At first you will have to guess until you learn which image is more likely to be correct. Press the LEFT key for the LEFT image. Press the RIGHT key for the RIGHT image. The feedback will tell if your choice was correct, but the feedback is not perfectly reliable. Sometimes, your choice may be wrong even though it was correct many times in the past. Place your hand on the table with your fingers extended and resting comfortably on the LEFT and RIGHT keys."

PST probe: In the PST probe phase, the pairs are recombined such that each image is presented along with all of the other images in the training (28 in all, including the original pair, see below). Each pair is presented once without feedback. Whether subjects learn more using positive or negative feedback is determined by how often the the higher probability item in the pair is chosen (learning from pos. feedback) versus how often the lower probability item in the pair is avoided (learning from neg. feedback).

Recombined cards presented during PST probe:
100	0
100	20
100	30
100	40
100	60
100	70
100	80
80	0
80	20
80	30
80	40
80	60
80	70
70	0
70	20
70	30
70	40
70	60
60	0
60	20
60	30
60	40
40	0
40	20
40	30
30	0
30	20
20	0

The following instructions appear on the screen and are read to the participant by the experimenter:

PST Probe Instructions: "Testing Session: Again you will be shown sets of two images. As before, try to choose the image that is most likely to be correct. Press the LEFT button for the left image. Press the RIGHT button for the right image. You will not be receiving feedback during this session. Just try to pick the correct one based on what you have learned so far."

Training 2: Training 2 is identical to Training 1 except that a fixed number of trials are presented (40 trials) without requiring participants to train to criteria. Instructions for Training 2 are:

Training 2 Instructions: "Learning Session 2: Again you will be shown sets of two images. Try to to choose the image that is most likely to be correct. Press the LEFT button for the left image. Press the RIGHT button for the right image. The feedback will tell if your choice was correct, but the feedback is not perfectly reliable. Just try to pick the one that is more likely to be correct."

PRLT: The reversal phase occurs after an additional training phase (Training 2) that is identical to the first training session, using the same stimuli. During reversal, the correct image in half of the original 4 pairs is reversed. These are the 100/0 and 70/30 pairs. For these pairs, the probabilities are reversed, such that the card that was previously correct 100% of the time is never correct, and the card correct 70% of the time is now only correct 30% of the time. Each of the four pairs is presented 10 times (40 trials total). Instructions for the PRLT are similar to those for Training 2.

PRLT Instructions: "Testing Session 2: Once more you will be shown sets of two images. Try to choose the image that is most likely to be correct. Press the LEFT button for the left image. Press the RIGHT button for the right image. The feedback will tell if your choice was correct, but the feedback is not perfectly reliable. Just try to pick the one that is most likely to be correct."

Counterbalancing: In all stages of the experiment, the image pairs were presented according to a list that contained a pre-determined sequence of trials. Four groups of lists were used, and each participant was randomly assigned to one of the four groups. In each group, eight images are drawn from a set of twelve. Each image was assigned to a separate probability across groups.

Task Structure Detail

Task Structure
- Participants performed 4 blocks, each corresponding to a different stage in the PST/PRLT tasks.
  - Training 1: Participants performed as many trials as required for reaching learning criteria or until 160 trials were completed (NEED TO VERIFY IF BLOCK TERMINATES AT 160 trials).
    - Performance criteria calculation: cumulative performance accuracy was calculated once 60 trials were completed. The table below indicates how many trials were completed for each pair for each group list in the first 60 trials:

Group 1: 17 trials of 60/40 pair, 15 trials of 80/20 pair, 14 trials of 70/30 pair, and 14 trials of 100/0 pair

and was continuously calculated for each trial thereafter until criterion had been reached.

Learning criteria were the following: 70% accuracy on 100/0 pair, 65% accuracy on 80/20 pair, 60% accuracy on 70/30 pair, and 55% accuracy on 60/40 pair. Once criterion had been reached for a given pair, accuracy calculations were discontinued for that pair, but trial presentations continued until criterion on other pairs had been reached. Eighty trials specified in the group list were sequentially presented. If criteria were not reached at 80 trials, trial presentations began again from the beginning of the list.

- - PST Probe: 28 trials consisting of recombined pairs (including original pairs presented during Training 1)
  - Training 2: 40 trials
  - PRLT: 40 trials

Timing:
- All trials began with presentation of the stimuli. Participants were not under time-pressure to respond (i.e., self-paced).
- Immediately following the participant's response, feedback was presented for 1 second.
- An inter-stimulus interval (time between off-set of feedback and onset of the subsequent trial) of 500 ms was used during which only the grey background appeared.

Stimulus Characteristics
- sensory modality: visual
- functional modality : spatial/categorical
- presentation modality: computer (eprime)

Performance Feedback Characteristics
- sensory modality: visual
- functional modality: verbal
- reward (e.g., none, points, money, food): none
- description: feedback appears as "Correct!" in green font or "Incorrect!" in red font above the image pairs.

Response Characteristics
- response required: yes
- effector modality: manual
- functional modality: keypress
- response options: forced choice (left/right)
- response collection: keyboard

Assessment/Control Characteristics
- Timing: self-paced,
- Average Run Time: 10 mins

Task Schematic

Screenshot of LA2K RL: stimulus presentation. Participant responding is self-paced

Screenshot of LA2K RL: positive feedback presentation. Feedback is presented for 1 second.

Screenshot of LA2K RL: negative feedback presentation. Feedback is presented for 1 second.

After feedback presentation, a grey background was displayed for 500 milliseconds prior to the next trial.

Task Parameters Table

Task parameters table to be inserted.

Stimuli

Twelve abstract computer-generated images (ArtMatic Pro, U&I Software LLC, http://uisoftware.com) were used in the tasks (see examples below). Each image was 128x128 pixels. They were centrally presented side-by-side (separated by 81 pixels) on a grey background. Stimuli

Example stimulus: image A

Example stimulus: image B

In each of the four counterbalancing groups, eight images were drawn from the set of twelve.

Group 1: C,E,F,G,H,J,K,L
Group 2: A,B,C,D,G,I,K,L
Group 3: B,C,D,E,F,H,J,K
Group 4: A,B,D,F,G,H,I,J

Performance feedback during training and PRLT stages appeared as text (Font: Gill Sans MT, Size: 24 point) above the image pairs. Positive feedback was indicated by "Correct!" in green font, and negative feedback by "Incorrect." in red font.

Dependent Variables

PST Probe

The primary dependent variables for the PST Probe are:

1. The number of times subjects chose the 80% probability card in the following pairs (80/70, 80/60, 80/40, 80/30, 80/20). This is a measure of learning from positive feedback since it indicates how often subjects chose the stimulus most associated with positive feedback. (variable name: Choose_80)

2. The number of times subjects avoided the 20% probability card in the following pairs (20/80, 20/70, 20/60, 20/40, 20/30). This is a measure of how often the stimulus most often associated with negative feedback was avoided. (variable name: Avoid_20)

Of course, finer scaled measures of variables number 1 and 2 may be computed by examining how often participants choose the 70% card over 60, 40, 30, and 20, etc. or how often they choose the 30% card over the 80, 70, 60, 40, and 30.

As a control measure to assess learning of the original pairs (100/0, 80/20, 70/30, 60/40) during the training period, accuracy for the original pairs presented during the PST probe can be examined. Since feedback is not presented in the PST probe period, this measure may be interesting in light of work indicating separate memory systems invoked during feedback learning and subsequent performance (without feedback) (e.g., Shohamy et al, 2004)

PRLT

The primary dependent variables for the PRLT are accuracy (proportion correct out of 10 trials) calculated separately for the 100/0 (deterministic) and 70/30 (probabilistic) trials (variable names: RL_REV_100_0R_MN and RL_REV_70_30R_MN)

Further dependent variables of interest include the number of trials required to reach criterion during Training1 and Training2. Response times compared across the card pairs can also serve as a measure of response conflict. For example, greater response conflict may be inferred when response times to 60/40 cards are longer than 80/20.

Table of all summary variables:

	VARIABLE NAME	STAGE	Description
1	RL_TR1_100_0_MN	Training 1	100/0 pair	Mean Accuracy
2	RL_TR1_100_0_MD	Training 1		Median RT
3	RL_TR1_100_0_STD	Training 1		Stddev RT
4	RL_TR1_60_40_MN	Training 1	60/40 pair	Mean Accuracy
5	RL_TR1_60_40_MD	Training 1		Median RT
6	RL_TR1_60_40_STD	Training 1		Stddev RT
7	RL_TR1_70_30_MN	Training 1	70/30 pair	Mean Accuracy
8	RL_TR1_70_30_MD	Training 1		Median RT
9	RL_TR1_70_30_STD	Training 1		Stddev RT
10	RL_TR1_80_20_MN	Training 1	80/20 pair	Mean Accuracy
11	RL_TR1_80_20_MD	Training 1		Median RT
12	RL_TR1_80_20_STD	Training 1		Stddev RT
13	RL_PRB_100_0_MN	PST Probe		Accuracy - selection of highest probability card
14	RL_PRB_100_0_MD	PST Probe		Median RT
16	RL_PRB_100_20_MN	PST Probe
17	RL_PRB_100_20_MD	PST Probe
19	RL_PRB_100_30_MN	PST Probe
20	RL_PRB_100_30_MD	PST Probe
22	RL_PRB_100_40_MN	PST Probe
23	RL_PRB_100_40_MD	PST Probe
25	RL_PRB_100_60_MN	PST Probe
26	RL_PRB_100_60_MD	PST Probe
28	RL_PRB_100_70_MN	PST Probe
29	RL_PRB_100_70_MD	PST Probe
31	RL_PRB_100_80_MN	PST Probe
32	RL_PRB_100_80_MD	PST Probe
34	RL_PRB_20_0_MN	PST Probe
35	RL_PRB_20_0_MD	PST Probe
37	RL_PRB_30_0_MN	PST Probe
38	RL_PRB_30_0_MD	PST Probe
40	RL_PRB_30_20_MN	PST Probe
41	RL_PRB_30_20_MD	PST Probe
43	RL_PRB_40_0_MN	PST Probe
44	RL_PRB_40_0_MD	PST Probe
46	RL_PRB_40_20_MN	PST Probe
47	RL_PRB_40_20_MD	PST Probe
49	RL_PRB_40_30_MN	PST Probe
50	RL_PRB_40_30_MD	PST Probe
52	RL_PRB_60_0_MN	PST Probe
53	RL_PRB_60_0_MD	PST Probe
55	RL_PRB_60_20_MN	PST Probe
56	RL_PRB_60_20_MD	PST Probe
58	RL_PRB_60_30_MN	PST Probe
59	RL_PRB_60_30_MD	PST Probe
61	RL_PRB_60_40_MN	PST Probe
62	RL_PRB_60_40_MD	PST Probe
64	RL_PRB_70_0_MN	PST Probe
65	RL_PRB_70_0_MD	PST Probe
67	RL_PRB_70_20_MN	PST Probe
68	RL_PRB_70_20_MD	PST Probe
70	RL_PRB_70_30_MN	PST Probe
71	RL_PRB_70_30_MD	PST Probe
73	RL_PRB_70_40_MN	PST Probe
74	RL_PRB_70_40_MD	PST Probe
76	RL_PRB_70_60_MN	PST Probe
77	RL_PRB_70_60_MD	PST Probe
79	RL_PRB_80_0_MN	PST Probe
80	RL_PRB_80_0_MD	PST Probe
82	RL_PRB_80_20_MN	PST Probe
83	RL_PRB_80_20_MD	PST Probe
85	RL_PRB_80_30_MN	PST Probe
86	RL_PRB_80_30_MD	PST Probe
88	RL_PRB_80_40_MN	PST Probe
89	RL_PRB_80_40_MD	PST Probe
91	RL_PRB_80_60_MN	PST Probe
92	RL_PRB_80_60_MD	PST Probe
94	RL_PRB_80_70_MN	PST Probe
95	RL_PRB_80_70_MD	PST Probe
97	RL_TR2_100_0_MN	Training 2	100/0 pair	Mean Accuracy
98	RL_TR2_100_0_MD	Training 2		Median RT
99	RL_TR2_100_0_STD	Training 2		Stddev RT
100	RL_TR2_60_40_MN	Training 2	60/40 pair	Mean Accuracy
101	RL_TR2_60_40_MD	Training 2		Median RT
102	RL_TR2_60_40_STD	Training 2		Stddev RT
103	RL_TR2_70_30_MN	Training 2	70/30 pair	Mean Accuracy
104	RL_TR2_70_30_MD	Training 2		Median RT
105	RL_TR2_70_30_STD	Training 2		Stddev RT
106	RL_TR2_80_20_MN	Training 2	80/20 pair	Mean Accuracy
107	RL_TR2_80_20_MD	Training 2		Median RT
108	RL_TR2_80_20_STD	Training 2		Stddev RT
112	RL_REV_60_40_MN	Reversal	60/40 pair	Mean Accuracy
113	RL_REV_60_40_MD	Reversal		Median RT
114	RL_REV_60_40_STD	Reversal		Stddev RT
118	RL_REV_80_20_MN	Reversal	80/20 pair	Mean Accuracy
119	RL_REV_80_20_MD	Reversal		Median RT
120	RL_REV_80_20_STD	Reversal		Stddev RT
121	RL_REV_100_0R_MN	Reversal	100/0 Rev pair	Mean Accuracy
122	RL_REV_100_0R_MD	Reversal		Median RT
123	RL_REV_100_0R_STD	Reversal		Stddev RT
124	RL_REV_70_30R_MN	Reversal	70/30 Rev pair	Mean Accuracy
125	RL_REV_70_30R_MD	Reversal		Median RT
126	RL_REV_70_30R_STD	Reversal		Stddev RT
143	Choose_100	PST Probe	proportion times 100 selected in the following pairs: 100/80 100/70 100/60 100/40 100/30 100/20
144	Choose_80	PST Probe	proportion times 80 selected in the following pairs: 80/70 80/60 80/40 80/30 80/0
145	Choose_70	PST Probe	proportion times 70 selected in the following pairs: 70/60 70/40 70/20 70/0
146	Choose_60	PST Probe	proportion times 60 selected in the following pairs: 60/30 60/20 60/0
147	Choose_40	PST Probe	proportion times 40 selected in the following pairs: 40/30 40/20 40/0
148	Choose_30	PST Probe	proportion times 30 selected in the following pairs: 30/20 30/0
149	Choose_20	PST Probe	proportion times 20 selected in the following pairs: 20/0
150	Avoid_80	PST Probe	proportion times 80 avoided in 80/100
151	Avoid_70	PST Probe	proportion times 70 avoided in 70/100 70/80
152	Avoid_60	PST Probe	proportion times 60 avoided in 60/100 60/80 60/70
153	Avoid_40	PST Probe	proportion times 40 avoided in 40/100 40/80 40/70
154	Avoid_30	PST Probe	proportion times 30 avoided in 30/100 30/80 30/60 30/40
155	Avoid_20	PST Probe	proportion times 20 avoided in 20/100 20/70 20/60 20/40 20/30
156	Avoid_0	PST Probe	proportion times 0 avoided in 0/80 0/70 0/60 0/40 0/30 0/20
157	Choose_HI_Prob_MN	PST Probe	mean of choose vars above
158	Avoid_HI_Prob_MN	PST Probe	mean of avoid vars above

Cleaning Rules

The main cleaning rules involve eliminating trials for which response times are unrealistically low (e.g., 100 ms). Since training continues until criterion has been reached, it is assumed that subjects have adequately learned to the PST and PRLT. Since only one trial per card pair is presented during PST, it is important to determine that all responses were made for the trials containing the probabilities of interest. For example, if choose 80 and avoid 20 is of interest, it's important to make sure that all responses were made to card pairs with these values.

Code/Algorithms

History of Checking Scoring:

Training: need to include trials to criterion measure

PRLT: Dara Ghahremani checked summary variables with Stone for accuracy in fall of 2009.

PST: Dara worked with Stone on final scoring for Probe summary variables in fall 2010, but some adjustments are still required. Working with stone to refine.

Data Distributions

To be inserted

References

Frank MJ, Seeberger LC, O'Reilly RC. By carrot or by stick: cognitive reinforcement learning in Parkinsonism. Science 2004;306:1940-1943

Frank MJ, Moustafa AA, Haughey HM, Curran T, Hutchison KE. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc Natl Acad Sci USA 2007;104:16311-16316

Lawrence AD, Sahakian BJ, Rogers RD, Hodges JR, Robbins TW. Discrimination, reversal, and shift learning in Huntington's disease: mechanisms of impaired response selection. Neuropsychologia 1999;37:1359-1374.

Shohamy, D., Myers, C. E., Grossman, S., Sage, J., Gluck, M. A., and Poldrack, R. A. Cortico-striatal contributions to feedback-based learning: converging data from neuroimaging and neuropsychology Brain 2004; 127, 851-859

Swainson R, Rogers RD, Sahakian BJ, Summers BA, Polkey CE, Robbins TW. Probabilistic learning and reversal deficits in patients with Parkinson's disease or frontal or temporal lobe lesions: possible adverse effects of dopaminergic medication. Neuropsychologia 2000;38:596-612

@@ Line 77: / Line 77: @@
 *	Task Structure
 **     Participants performed 4 blocks, each corresponding to a different stage in the PST/PRLT tasks.
-***   Training1: Participants performed as many trials as required for reaching learning criteria or until 160 trials were completed (''NEED TO VERIFY IF BLOCK TERMINATES AT 160 trials'').
+***   Training 1: Participants performed as many trials as required for reaching learning criteria or until 160 trials were completed (''NEED TO VERIFY IF BLOCK TERMINATES AT 160 trials'').
-**** Performance criteria calculation: cumulative performance accuracy was calculated once 60 trials were completed (17 trials of 60/40 pair, 15 trials of 80/20 pair, 14 trials of 70/30 pair, and 14 trials of 100/0 pair) and was continuously calculated for each trial thereafter until criterion had been reached. Learning criteria were the following: 70% accuracy on 100/0 pair, 65% accuracy on 80/20 pair, 60% accuracy on 70/30 pair, and 55% accuracy on 60/40 pair. Once criterion had been reached for a given pair, accuracy calculations were discontinued for that pair, but trial presentations continued until criterion on other pairs had been reached. Eighty trials specified in the group list were sequentially presented. If criteria were not reached at 80 trials, trial presentations began again from the beginning of the list.
+**** Performance criteria calculation: cumulative performance accuracy was calculated once 60 trials were completed. The table below indicates how many trials were completed for each pair for each group list in the first 60 trials:
+  {|
+|Group 1: 17 trials of 60/40 pair, 15 trials of 80/20 pair, 14 trials of 70/30 pair, and 14 trials of 100/0 pair
+|Group 1: 17 trials of 60/40 pair, 15 trials of 80/20 pair, 14 trials of 70/30 pair, and 14 trials of 100/0 pair
+|Group 1: 17 trials of 60/40 pair, 15 trials of 80/20 pair, 14 trials of 70/30 pair, and 14 trials of 100/0 pair
+|Group 1: 17 trials of 60/40 pair, 15 trials of 80/20 pair, 14 trials of 70/30 pair, and 14 trials of 100/0 pair
+|}
+and was continuously calculated for each trial thereafter until criterion had been reached.
+Learning criteria were the following: 70% accuracy on 100/0 pair, 65% accuracy on 80/20 pair, 60% accuracy on 70/30 pair, and 55% accuracy on 60/40 pair. Once criterion had been reached for a given pair, accuracy calculations were discontinued for that pair, but trial presentations continued until criterion on other pairs had been reached. Eighty trials specified in the group list were sequentially presented. If criteria were not reached at 80 trials, trial presentations began again from the beginning of the list.
 ***   PST Probe: 28 trials consisting of recombined pairs (including original pairs presented during Training 1)
-***   Training2: 40 trials
+***   Training 2: 40 trials
 ***   PRLT: 40 trials

Difference between revisions of "CNP RL"

Revision as of 00:01, 6 June 2011

Contents

Basic Task Description

Task Procedure

Task Structure Detail

Task Schematic

Task Parameters Table

Stimuli

Dependent Variables

Cleaning Rules

Code/Algorithms

Data Distributions

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

<b>Phenowiki Navigation</b>

<b>AIR-B Navigation</b>

<b>HNS Navigation</b>

Tools