Estimating the genotypic intelligence of populations and assessing the impact of socioeconomic factors and migrations

Factor analysis of allele frequencies was used to identify signals of polygenic selection on human intelligence. Four SNPs which reached genome-wide significance in previous meta-analyses were used. Allele frequencies for 26 population were obtained from 1000 Genomes. The resulting factor scores were highly correlated to average national IQ (r=0.92). A regression of IQ differences between subcontinental groups on the 4 SNPs g factor and an index of genome-wide genetic distances showed the former was an independent and significant predictor (Beta= 1.14), whereas genome-wide distances lost all predictive power. This finding suggests that the relationship between the 4 SNPs g factor and IQ is due to natural selection on a specific phenotype and not the result of a spurious correlation arising from genome-wide evolutionary processes such as random drift or migrations. A regression of IQs on genetic factor scores of developed countries was used to estimate the predicted genotypic IQs of developing countries. The residuals (difference between predicted and actual scores) were negatively correlated to per capita GDP and Human Development Index, implying that countries with low socioeconomic conditions have not yet reached their full intellectual potential.


INTRODUCTION
To date, a few genes have replicated their association with intelligence. Rietveld et al. (2013)'s metaanalysis found ten SNPs that increased educational attainment, comprising three with nominal genome-wide significance and seven with suggestive significance. A recent study has replicated the positive effect of these top three SNPs (rs9320913, rs11584700 and rs4851266) on mathematics and reading performance in an independent sample of school children (Ward et al., 2014). These SNPs were also associated with g (general intelligence) in a sub-sample of Rietveld et al.'s original study. Another SNP (rs236330), located within gene FNBP1L, showed a significant association with general intelligence, reported in two separate studies (Davies et al, 2011;Benyamin et al, 2013). This gene is strongly expressed in neurons, including hippocampal neurons and developing brains, where it regulates neuronal morphology (Davies et al, 2011). Piffer (2013) applied principal components analysis (PCA) to allele frequencies to obtain an estimate of natural selection (or deviation from random drift) on different alleles correlated to the same phenotype.
The aim of this paper is to provide updated genotypic IQ scores for populations, by using the updated 1000 Genomes database (phase 3) comprising 26 populations instead of 14 and different factor analytic methods instead of PCA. Another aim of this paper is to test the hypothesis that a detrimental environment can depress average phenotypic IQ, hence populations living in worse socioeconomic conditions would not have reached BIOLOGICAL SCIENCES  Estimating the genotypic intelligence of populations and assessing the impact of socioeconomic factors and migrations.

ABSTRACT
Factor analysis of allele frequencies was used to identify signals of polygenic selection on human intelligence. Four SNPs which reached genome-wide significance in previous meta-analyses were used. Allele frequencies for 26 population were obtained from 1000 Genomes. The resulting factor scores were highly correlated to average national IQ (r=0.92). A regression of IQ differences between subcontinental groups on the 4 SNPs g factor and an index of genome-wide genetic distances showed the former was an independent and significant predictor (Beta= 1.14), whereas genome-wide distances lost all predictive power. This finding suggests that the relationship between the 4 SNPs g factor and IQ is due to natural selection on a specific phenotype and not the result of a spurious correlation arising from genome-wide evolutionary processes such as random drift or migrations. A regression of IQs on genetic factor scores of developed countries was used to estimate the predicted genotypic IQs of developing countries. The residuals (difference between predicted and actual scores) were negatively correlated to per capita GDP and Human Development Index, implying that countries with low socioeconomic conditions have not yet reached their full intellectual potential. ✎ © Piffer This article is distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited. their full potential (as indicated by their genotypic score).

METHODS
IQ/educational attainment increasing alleles: Rietveld et al's top 3 hits were included together with another SNP(rs236330), located within gene FNBP1L, reported in two separate studies (Davies et al, 2011;Benyamin et al, 2013).
IQ for Tuscany was calculated as the average between the IQ estimated from PISA Creative Problem Solving (Piffer & Lynn, 2014) and from PISA Math, Science, Reading. There were 3 missing cases (Chinese Dai, Gujarati Indian, Indian Telegu). Genome-wide distances (Fst) were obtained from Gedmatch K13 (2013). Piffer (2013) used Principal Components Analysis of population data from 1000 Genomes, phase 1, which had data for only 14 populations from four racial clusters and from 50 populations contained in ALFRED. Besides ALFRED, here I use the last updated 1000 Genomes phase 3 data, comprising 26 populations from five continental groups. I employed factor analysis instead of principal components analysis because it is the preferred method when the purpose is identifying a latent structure free from unique variance (i.e. error), which in the case of allele frequencies can be due to random drift (shifting frequencies randomly up or down) or inadequate (i.e. small) sampling.

FACTOR ANALYSIS OF 4 IQ INCREASING ALLELES.
A previous study found that the specific factor extraction method employed did not affect results much except for the use of principal components analysis which produced inflated loadings (Kirkegaard, 2014). To further examine how factor extraction method influences results several methods were employed (minimum residuals, weighted least squares, generalized least squares, principal axis factoring, maximum likelihood) and factor scores were obtained using different methods (Thurstone, Harman, and Bartlett). These all produced nearly identical results, yet they were averaged to create a composite vector. The composite factor had slightly higher validity, as suggested by its slightly higher correlation with Lynn and Vanhanen's national IQs (r=0.92 vs 0.91). Conversely, the component extracted with PCA had a slightly lower correlation (r=0.88). These are all in the right direction (positive) and high.  The correlation between National IQ and factor scores was 0.92 (N=23, p=0.000). Together with the factor loadings, this suggests that this factor represents a signal of polygenic selection on human intelligence and can be used as an indicator of population-level "genotypic intelligence" or "intellectual potential".
The regression of IQ on the 4 SNPs g factor is plotted in figures 1a and 1b. Inspection of the Q-Q(residuals vs. theoretical quantiles) plot revealed that residuals were normally distributed. Visual inspection of figure 1b shows that populations belonging to the same continent tend to cluster on the genetic factor and that the correlation is driven by racial clusters. South Asians and Hispanics, two groups genetically distant from each other, have similar scores on the genotypic intelligence factor.
A one-way ANOVA was carried out and the difference between racial groups was significant (F 4,21=113.16; p=0.000) Results are reported in table 3. Tukey post-hoc test revealed that all the differences between the five groups were significant (p<0.002) with the exception of SAS-HISP (p=0.998).

PREDICTING GENOTYPIC IQ
A regression was run with the 4 SNPs g factor as independent and IQs for developed countries only (to eliminate the confounding effect of socioeconomic/environmental disparities) as the dependent variable. This left 9 cases, but the correlation between genetic factor scores and national IQ was stronger compared to the entire sample (r=0.98), possibly because the average IQ of developed countries more closely mirrors their genotypic potential. Inspection of the Q-Q(residuals vs. theoretical quantiles) plot revealed that residuals were normally distributed.
To predict genotypic IQs of developing countries (missing from the regression), the unstandardized predicted values were used. A conversion to Greenwich IQ was made by setting the British IQ to 100. These are shown in table 4a. Yoruba 82.0 The difference between the predicted (genotypic) and the observed (measured) IQ of developing countries was calculated (table 4b). These are not residuals in the strict sense because the regression analysis was carried out using data for developed countries only. Hence they will be called "pseudoresiduals". These were correlated to indexes of economic and human development. The correlations were both in the expected direction: r x GDP= -0.34 (N=15, p=0.214); r x HDI= -0.777 (N=14, p=0.001). GDP had an outlier (Puerto Rico), and the correlation increased after its removal (r=-0.7, N=14, p=0.005).

CONTROLLING FOR THE EFFECT OF MIGRATIONS AND DRIFT
In order to control for the potential confounding effects of migrations and drift on the relationship between IQ and the 4 SNPs g factor, an index of genome-wide genetic distance (Fst) was used. To make the calculations simpler, only the 5 continental groups were used because the Gedmatch distances do not have enough resolution to accurately represent single populations. This is not a major issue as we have seen above that the correlation between national IQs and the 4 SNPs g factor is mostly driven by sub-continental (racial) clusters. As there was not a perfect overlap between Gedmatch and 1000 Genomes clusters (there were more Gedmatch clusters), if a 1000 Genomes group comprised more than one cluster, the average between the sub-clusters was used. This procedure is described in the Appendix. Three separate distance matrixes were created for the dependent (IQ) and the independent (4 SNPs g factor, Gedmatch Distances). These represent the difference (absolute value) between each of the 5 continental groups on the three variables, giving a total of 30 distances (10 for each variable). These are reported in table 5 and the original matrices are reported in table 7 (Appendix). There was a positive correlation between the genome-wide (Gedmatch) distances and the 4 SNPs g factor: r= 0.67 (N=10, p= 0.032).
To assess the relationship of the 4 SNPs g factor net of genome-wide distances, a regression was run with IQ difference as dependent and 4 SNPs g factor, Gedmatch distances as independent variables (table 6). A significant model emerged (F2,9= 26.58, p= 0.01). The 4 SNPs g factor was the only significant predictor (Beta=1.222; p=0.005). Interestingly, the genome-wide distance effect was reversed (compared to the bivariate correlation), implying that greater genome-wide distances are associated with smaller IQ differences between continents.

DISCUSSION
Factor analysis was used to extract a factor from the frequencies of 4 alleles for 26 populations (1000 Genomes). Its interpretation as an indicator of genotypic intelligence or the strength of natural selection on it was supported by a strong correlation (r=0.92) to the average phenotypic (national/ethnic) IQs of 23 populations. The four alleles loaded highly and in the expected direction on this factor, supporting its reliability. There were significant sub-continental differences between groups, with a hierarchy topped by East Asians and Europeans, Hispanics and South Asians in the middle and Sub-Saharan Africans at the bottom. Further evidence that the factor represents selection and not genome-wide evolutionary processes, such as random drift or migrations, comes from the finding that the rank of sub-continental genotypic scores of intelligence did not perfectly match measures of genetic distances obtained from neutral markers and was an independent predictor of IQ. The correlation between subcontinental genetic genome-wide distances and the differences in the 4 SNPs g factor was moderately strong (r=0.67), suggesting that the 4 SNPs genetic factor contains noise due to genome-wide evolutionary processes (e.g. migrations, drift), not limited to selection for intelligence. However, genetic distances were not significantly correlated to IQ differences (r= 0.26) and in the regression model with 4 SNPs g factor, they predicted IQ differences in the opposite direction (Beta= -0.56). That is, after accounting for the effect of the 4 SNPs g factor, greater genome-wide genetic distances were associated with lower IQ differences. However, this effect was not significant. On the other hand, the 4 SNPs g factor emerged as a strong (and significant) positive predictor of IQ differences (Beta= 1.22).
The results also provide preliminary evidence in favor of the hypothesis that poor environmental conditions (i.e. economic and sociocultural) tend to depress national IQ scores. Countries with lower per capita GDP and a lower index of Human Development tended to have larger positive "residuals", that is the difference between the score predicted by the regression (of IQs for developed countries on the 4 SNPs g factor) and the actually measured IQ was larger in countries with lower GDP and HDI (r around 0.7). Thus, poorer and less developed countries have yet to reach their full intellectual potential.
The results of this study indicate that the gaps in intellectual performance between some populations can be narrowed via adequate improvement of environmental conditions, however the overall pattern of intellectual scores is due to relatively stable and fixed (genetic) factors and cannot be substantially altered.

REFERENCES:
Armstrong