Meiyue Wang and Shizhong Xu

Department of Botany and Plant Sciences, University of California, Riverside

Single marker models that detecting one locus at a time are subject to many problems in genome-wide association studies (GWAS) and quantitative trait locus (QTL) mapping, which includes large matrix inversion, over-conservativeness after Bonferroni correction and difficulty in evaluation of total genetic contribution. Such problems can be solved by a multiple locus model which includes all markers in the same model with effects being estimated simultaneously. The sparse Bayesian learning method (SBL), implemented in `sbl`

package, is a multiple locus model that can handle extremely large sample size (>100,000) and outcompetes other multiple locus GWAS methods in terms of detection power and computing time.

`sbl`

package installation`sbl`

can be downloaded and installed locally. The download link is here.

`install.packages('sbl_0.1.0.tar.gz', repos=NULL, type='source')`

The usage of `sbl`

to perform QTL mapping and GWAS is very simple (Note: Please remove the markers without variation before running the program):

- Load input data
- A vector of response variables (observations of the trait).
- A design matrix for fixed effects, which represents non-genetic factors that contribute to the phenotypic variance. This can be the intercept only.
- A matrix storing genotype indicators. The original three genotypes (\(a_1a_1\), \(a_1a_2\), \(a_2a_2\)) should be coded to numeric indicator as (-1, 0, 1) or (0, 1, 2).

```
library('sbl')
# load example data
data(phe)
data(intercept)
data(gen)
```

- Call
`sblgwas()`

function to perform regression and detect significant markers.

```
# A minimal invocation of "sblgwas()" function looks like:
fit1<-sblgwas(x = intercept, y = phe, z = gen)
# Restuls of markers surrounding the second simulated QTL with non-zero effect in the example data
fit1$blup[c(17:21),]
```

```
## gamma vg wald p_wald
## 17 0.000000 0.0000000 0.00000 1.000000e+00
## 18 2.146942 0.1789244 25.76150 3.863190e-07
## 19 -2.050378 0.1539640 27.30541 1.737251e-07
## 20 0.000000 0.0000000 0.00000 1.000000e+00
## 21 0.000000 0.0000000 0.00000 1.000000e+00
```

Users can arbitrarily set the value of `t`

between [0, 2] to control the sparseness of model, default is -1.

```
# Setting t = 0 leads to the most sparse model
fit2<-sblgwas(x = intercept, y = phe, z = gen, t = 0)
fit2$parm
```

```
## iter error s2 beta df
## 1 22 5.930216e-07 2.827957 19.78626 4.612319
```

```
# Setting t = -2 leads to the least sparse model
fit3<-sblgwas(x = intercept, y = phe, z = gen, t = -2)
fit3$parm
```

```
## iter error s2 beta df
## 1 31 8.46504e-07 1.003592 20.03221 9.707904
```

`max.iter`

and `min.err`

are two thresholds to stop the program when either of them is met. `max.iter`

defines the maximum number of iterations that the program is allowed to run, default is 200; `min.err`

defines the minimum threshold of mean squared error of random effects estimated from the current and the previous iteration, default is 1e-6.

```
# Set max.iter and min.err to control the convergence of the program
fit4<-sblgwas(x = intercept, y = phe, z = gen, t = -1, max.iter = 300, min.err = 1e-8)
fit4$parm
```

```
## iter error s2 beta df
## 1 18 4.675823e-09 2.30688 19.70123 5.332911
```