In last week’s Science, Dosenbach et al describe a set of sophisticated machine learning techniques they’ve used to predict age from the way that hemodynamics correlate both within and across various functional networks in the brain. As described over at the BungeLab Blog, and at Neuroskeptic, the classification is amazingly accurate, generalizes easily to two independent data sets with different acquisition parameters, and has some real potential for future use in the diagnosis of developmental disorders – made all the easier since the underlying resting-state functional connectivity data takes only about 5 minutes to acquire from a given subject.
Somehow, their statistical techniques learned the characteristic features of functional change between the ages of 7 and 30 years. How exactly did they manage this?
First, they started with three data sets of resting-state BOLD activity; the first consisted of 238 resting-state scans from a 3T scanner from 192 individuals between 7-30 years of age. The second was of 195 scans from 183 subjects aged 7-31 years, each scan being an extraction of “rest” blocks from blocked fMRI designs which were then concatenated, having initially been acquired on a 1.5T scanner and a different pulse sequence than the first dataset. The third data set was 186 scans of 143 subjects aged 6-35 performing linguistic tasks, with task-related activity regressed out, using the same pulse sequence as the second dataset.
All the data was transformed to a single atlas and sent through a standard artifact-removal pipeline; next, activity in each of 160 10-mm spherical ROIs was calculated for each image in each scan, with the ROIs determined by a series of five meta-analyses the authors undertook on data of their own (wow!). The full cross-correlation matrix of correlations of ROIs across time was then calculated (yielding 12,270 correlations for each scan) and z-transformed.
Next they take this massive correlation matrix and use a support vector machine (SVM with soft margin, including a radial basis function “kernel trick”) to classify each timeseries as belonging to a child (7-11 years old) or an adult (24-30 years old), tested with leave-one-out-cross-validation. They kept only the highest-ranked 200 features of the trained SVMs for further analyses (a process of recursive feature elimination didn’t really help, so they just stuck with 200). Across all validations, the same set of 156 features consistently ended up in the top 200, and were used for visualization of the feature weights. In this step they could classify adults vs. children at 91% accuracy.
They next used support vector regression to predict, based on the retained 200 features, the age of the subject in the scanner. Predicted ages were converted into a “functional connectivity maturation index” which had a mean of 1.0 for ages 18 to 30 (we’ll come back to this), and revealed beautiful curves you’ve no doubt seen elsewhere by this point:
The best-fitting line here is actually either the Pearl-Reed (gray line – used in other contexts to model the growth of human populations in settings with limited resources) or the Von Bertalanffy (black line – used to model the growth of animals). The same basic effects were replicated on all three data sets.
The rest of the paper is mostly dedicated to visualizing what exactly it was that the SVMs were basing their surprisingly accurate predictions. It turns out that twice as much of the predicted age-related variance was explained by functional connectivity that decreased with advancing age as by that which increased with age. Moreover, decreasing connectivity was more common among nearby regions, whereas increasing functional connectivity tended to occur among more far-flung regions (similar to the local-to-distributed shift discussed previously). Functional connections that increased with age were more aligned in the anterior-posterior dimension than those that decreased with age; the single most age-discriminative set of ROIs was the “cingulo-opercular” network (also discussed previously), and the most age-discriminative individual ROI was the right anterior prefrontal cortex.
If all that wasn’t complicated enough, here’s a glimpse of the paper’s money shot:
Obviously, this is an incredibly impressive set of results with real-world value. But what are some of the potential pitfalls here?
One is that the classification actually took place in higher-dimensional space (>200 dimensions, as I understand it), meaning that the results are dependent on interactions of changes in functional connectivity among and within the 156 features visualized above. This kind of thing is not easily captured in the way the results have been visualized.
A second thing to be wary of is the conversion of chronological age to the predicted brain maturity index. I’m not following why exactly this conversion was necessary, but I assume it was due to a fall-off in the classifier’s accuracy for predicting the age of subjects who are, in reality, between the ages of 18 and 30. This likely indicates that functional connectivity asymptotes in its sensitivity to change in functional connectivity around that time. In other words, it’s likely not capturing whatever “wisdom” a 30 year old might have that differentiates them from an 18 year old.
(Assuming such a thing actually exists, it seems like it’s not “in” the functional connectivity data. On the other hand, some of their data sets may have under-sampled the older part of the age distribution – perhaps wisdom just takes statistical mega-power to detect.)
These caveats aside, it’s really beautiful work, and I believe it will really help real people really soon (TM). That’s far more than can be said about most of the work being done in this area, which is far more theoretical in nature.