soybase_linesAbout
This data comes from a soybean nested association mapping (NAM) population of more than 5000 recombinant inbred lines, provided by the SoyBase database and described in detail in the SoyNAM R package. Additional details about the data are described in Diers et al. (2018).
From the SoyNAM R package documentation:
The SoyNAM population (https://soybase.org/SoyNAM/index.php) is a nested association mapping panel that comprises more than 5000 recombinant inbred lines (RILs), including determinate, indeterminate, and semi-determinate genotypes from maturity groups (MG) ranging from late MG II to early MG IV, derived from 40 biparental populations, where progenies were not exposed to selection. Each biparental population approximately contains 140 individuals and all families share the cultivar IA3023 as the standard parent. From the other 40 founder parents, 17 lines are elite public germplasm from different regions, 15 have diverse ancestry and 8 are plant introductions. The SoyNAM population was designed to dissect the genetic architecture of complex traits and to map yield-associated quantitative trait loci (QTL) using a diverse panel.
Note. Both raw and quality-assured data are provided by the SoyBase database. We include the quality-assured data here for only 39 genetic families, as family 46 was removed for data quality purposes, as described in Diers et al. (2018):
An initial analysis of the SNP-genotyped RILs was conducted to identify RILs that deviated from the expected marker segregation (Song et al. 2017), and this led to 424 RILs being discarded because they had a SNP genotype identical with the female founder (i.e., were likely in- advertent female-parent self-pollinations), or they segregated for alleles that did not match the parent alleles. Most of the RILs from family N46 (PI507618B) fell into the latter category, indicating that a line other than the intended founder PI had been used in the mating with IA3023. Therefore, all lines from the N46 family were removed from the dataset.
Data
soybase_lines.csv
There are two data sets available:
soybase_lines.csvcontains phenotypic data from recombinant inbred lines (RILs)soybase_checks.csvcontains phenotypic data from check cultivars
These datasets share the same variables, as described in the data dictionary below.
Note. You can access a genotypic matrix associated with these datasets with the following steps:
- Load the SoyNAM R package with
library(SoyNAM) - Run
data(soybase), which loads data into your environment - Access the genetic matrix stored in the object
gen.qa
soybase_checks| Data Dictionary | |
|---|---|
| variable | description |
| environ | Environment (combination of year and location) |
| strain | Genetic strain |
| family | Genetic family |
| set | Microenvironment |
| height | Plant height (centimeters) |
| R8 | Number of days to maturity (stage R8 = 95% of pods are fully mature) |
| lodging | Lodging score (1 = all plants erect; 5 = all plants prostrate) |
| yield | Grain yield (Kg/ha) |
| protein | Percentage of protein in the seed |
| oil | Percentage of oil in the seed |
| size | Mass of 100 seeds (grams) |
soyin_lines.csv
soyin_linessoyin_checks| Data Dictionary | |
|---|---|
| variable | description |
| year | Year |
| environ | Environment (combination of year and location) |
| strain | Genetic strain |
| family | Genetic family |
| set | Microenvironment |
| BLOCK | Block (spatial coordinates of the field plots) |
| ROW | Row (spatial coordinates of the field plots) |
| COL | Column (spatial coordinates of the field plots) |
| height | Plant height (centimeters) |
| R1 | Number of days to flowering |
| R8 | Number of days to maturity (stage R8 = 95% of pods are fully mature) |
| lodging | Lodging score (1 = all plants erect; 5 = all plants prostrate) |
| yield | Grain yield (Kg/ha) |
| LeafShape | Ratio of leaf length to leaf width |
| Nodes | Number of nodes in the main stem |
| Pods | Number of pods in the main stem |
| Pods.Node | Number of pods per node |
| AvgCC | Average canopy coverage |
| RateCC | Rate of canopy coverage |
| GDD_R1 | Growing degree day to flowering |
| GDD_R8 | Growing degree day to maturity |