Soybean Yields

Predict soybean yields based on phenotypic, environmental and genetic factors.

environment

About

This data comes from a soybean nested association mapping (NAM) population of more than 5000 recombinant inbred lines, provided by the SoyBase database and described in detail in the SoyNAM R package. Additional details about the data are described in Diers et al. (2018).

From the SoyNAM R package documentation:

The SoyNAM population (https://soybase.org/SoyNAM/index.php) is a nested association mapping panel that comprises more than 5000 recombinant inbred lines (RILs), including determinate, indeterminate, and semi-determinate genotypes from maturity groups (MG) ranging from late MG II to early MG IV, derived from 40 biparental populations, where progenies were not exposed to selection. Each biparental population approximately contains 140 individuals and all families share the cultivar IA3023 as the standard parent. From the other 40 founder parents, 17 lines are elite public germplasm from different regions, 15 have diverse ancestry and 8 are plant introductions. The SoyNAM population was designed to dissect the genetic architecture of complex traits and to map yield-associated quantitative trait loci (QTL) using a diverse panel.

Note. Both raw and quality-assured data are provided by the SoyBase database. We include the quality-assured data here for only 39 genetic families, as family 46 was removed for data quality purposes, as described in Diers et al. (2018):

An initial analysis of the SNP-genotyped RILs was conducted to identify RILs that deviated from the expected marker segregation (Song et al. 2017), and this led to 424 RILs being discarded because they had a SNP genotype identical with the female founder (i.e., were likely in- advertent female-parent self-pollinations), or they segregated for alleles that did not match the parent alleles. Most of the RILs from family N46 (PI507618B) fell into the latter category, indicating that a line other than the intended founder PI had been used in the mating with IA3023. Therefore, all lines from the N46 family were removed from the dataset.

Data

soybase_lines.csv

Note

There are two data sets available:

  • soybase_lines.csv contains phenotypic data from recombinant inbred lines (RILs)
  • soybase_checks.csv contains phenotypic data from check cultivars

These datasets share the same variables, as described in the data dictionary below.

Note. You can access a genotypic matrix associated with these datasets with the following steps:

  1. Load the SoyNAM R package with library(SoyNAM)
  2. Run data(soybase), which loads data into your environment
  3. Access the genetic matrix stored in the object gen.qa
soybase_lines
soybase_checks
Data Dictionary
variable description
environ Environment (combination of year and location)
strain Genetic strain
family Genetic family
set Microenvironment
height Plant height (centimeters)
R8 Number of days to maturity (stage R8 = 95% of pods are fully mature)
lodging Lodging score (1 = all plants erect; 5 = all plants prostrate)
yield Grain yield (Kg/ha)
protein Percentage of protein in the seed
oil Percentage of oil in the seed
size Mass of 100 seeds (grams)

soyin_lines.csv

soyin_lines
soyin_checks
Data Dictionary
variable description
year Year
environ Environment (combination of year and location)
strain Genetic strain
family Genetic family
set Microenvironment
BLOCK Block (spatial coordinates of the field plots)
ROW Row (spatial coordinates of the field plots)
COL Column (spatial coordinates of the field plots)
height Plant height (centimeters)
R1 Number of days to flowering
R8 Number of days to maturity (stage R8 = 95% of pods are fully mature)
lodging Lodging score (1 = all plants erect; 5 = all plants prostrate)
yield Grain yield (Kg/ha)
LeafShape Ratio of leaf length to leaf width
Nodes Number of nodes in the main stem
Pods Number of pods in the main stem
Pods.Node Number of pods per node
AvgCC Average canopy coverage
RateCC Rate of canopy coverage
GDD_R1 Growing degree day to flowering
GDD_R8 Growing degree day to maturity