Introduction

OneMap can be used to construct genetic maps in F1 full-sib populations with any marker segregation types, following the algorithm proposed by Wu et al., 2002. This tutorial demonstrates some of the functionality of OneMap by making use of an F1 apple marker dataset from Gardner et al., 2014. The unfiltered VCF file can be downloaded from DataDryad: http://doi.org/10.5061/dryad.55t54. Here, a filtered and converted version of this VCF file is used at the input file to OneMap.

library(onemap) #version 2.1.3
library(ggplot2)
source("LGbarplots.R")

apple <- read_onemap(dir = "./input_files", inputfile = "apple_onemap.txt")
##  Working...
## 
##  --Read the following data:
##  Type of cross:           outcross 
##  Number of individuals:   87 
##  Number of markers:       1309 
##  Chromosome information:  no 
##  Position information:    no 
##  Number of traits:        0

OneMap can create marker bins not based on LD, but simply whether markers have precisely identical genotypes, including or excluding NA values depending on whether exact is set to TRUE or FALSE, respectively. This is not critical in this example case, since the input file is already filtered by LD.

apple.bins <- find_bins(apple, exact = FALSE)
apple.binned <- create_data_bins(apple, apple.bins)

Plotting by segregation type gives a visual representation of the relative number of different segregation types in the input file:

plot_by_segreg_type(apple.binned)

OneMap has diagnostic tools for testing for and visualizing segregation distortion using a Chi-square test, and identifying only those markers which do not exhibit segregation distortion. For this tutorial, we will utilize only the 65% of markers which fall below a Bonferroni-adjusted threshold.

apple.segreg_test <- test_segregation(apple.binned)
plot(apple.segreg_test)

apple.no_dist <- select_segreg(apple.segreg_test, distorted = FALSE, numbers = TRUE) 
# numbers = TRUE returns the indices of the markers without segregation distortion

Two-point recombination fractions are calculated using the EM algorithm to solve the log likelihood equation appropriate to any particular combination of segregation types. LOD and max.rf values can be set now to be used in later function calls, but do not play a role in rf estimation, and can be overwritten later. apple.twopts$analysis$ is a list containing 4 matrices corresponding to the 4 possible linkage phase configurations.

apple.twopts <- rf_2pts(apple.binned, LOD = 4, max.rf = 0.4)

OneMap can suggest a LOD threshold with which to form linage groups using the function suggest_lod, however a more useful approach in my opinion is to use the barplots generated by the function LG in MapRTools. LGbarplots uses the OneMap function group to construct LGs, and then uses the ggplot command from LG to generate barcharts. This can take quite a bit of time to run depending on the number of markers, range of thresholds, and type of segregation types (maximizing the likelihood function using the EM algorithm is more computationally demanding for A-type markers).

thresholds <- seq(1:15)
rfs <- c(0.1,0.2,0.3,0.4,0.5)
for(r in rfs){
  p <- LGbarplots(RFmat = apple.twopts, markers = apple.no_dist, thresholds = thresholds, max.rf = r)
  filename <- paste("barplots_all/barplot_maxRF-", r, ".pdf", sep="")
  ggsave(filename, p, width = 10, height = 5, units = "in")
}

Many OneMap functions require as their input an object of class sequence, which can contain varying attributes depending on what object is passed to make_seq. In this case, the sequence object returned by make_seq simply identifies the elements of 2-point RF matrices that will be used to form linkage groups.

apple.mrks.no_dist <- make_seq(apple.twopts, c(apple.no_dist))

At a LOD of 5 and max.rf of 0.4, there appear to be 16 main LGs, and the largest LG appears to contain two chromosomes:

Indeed, comparing LG- and chromosome-assignments based on the reference genome, LG8 appears to contain most of both chromosomes 8 and chromosome 9:

apple.LGs_LOD5 <- group(apple.mrks.no_dist, LOD = 5, max.rf = 0.4 )
chroms <- sapply(strsplit(apple.LGs_LOD5$marnames,split="-",fixed=T),"[",1)
chr_vs_LG <- data.frame(Marker = apple.LGs_LOD5$marnames, LG = apple.LGs_LOD5$groups, Chr = chroms)
chr_vs_LG <- chr_vs_LG[chr_vs_LG$LG<=17,]
tabLOD5 <- table(chr_vs_LG$Chr,chr_vs_LG$LG)
tabLOD5[,order(apply(tabLOD5,2,which.max))]
##     
##      13  1  6 14 15 16  9  0  2 11 17 12 10  3  4  7  5  8
##   1  30  0  1  0  0  1  1  2  3  1  0  0  0  1  0  0  3  1
##   10  0 36  1  0  0  0  1  2  1  0  0  0  1  1  5  1  1  3
##   11  1  0 49  0  0  0  1  4  0  1  0  1  2  0  1  1  3  0
##   12  0  1  0 34  3  0  0  1  1  0  0  0  0  0  0  0  1  0
##   13  1  1  0  3  0 34  1  1  1  9  0  0  2  0  0  0  0  5
##   14  3  3  0  2  0  0 34  2  1  0  0  0  0  0  3  1  0  1
##   15  3  1  0  0  0  0  0  5 35  0  0  1  0  3  2  3  1  1
##   16  2  4  0  0  0  0  0  0  0 28  0  0  0  1  0  0  0  0
##   17  0  0  0  0  0  0  1  0  1  2 26  0  0  0  0  0  0  0
##   2   0  2  0  0  0  0  1  3  2  0  0 41  0  1  1  0  1  4
##   3   0  0  2  1  0  1  0  0  0  0  0  0 43  3  3  0  0  0
##   4   0  0  0  1  0  2  0  1  0  2  0  0  0 26  0  0  0  0
##   5   0  2  0  0  0  1  0  2  0  1  0  1  0  1 26  0  0  0
##   6   2  0  0  0  0  0  0  1  0  1  0  0  0  0  3 21  0  0
##   7   0  0  0  0  0  2  0  0  0  0  0  0  0  0  1  0 32  1
##   8   2  1  0  4  0  0  0  0  3  0  1  1  1  0  6  0  0 40
##   9   0  2  0  0  0  1  0  1  0  1  0  0  1  1  2  0  0 51

At a LOD of 6, this is no longer the case:

apple.LGs <- group(apple.mrks.no_dist, LOD = 6, max.rf = 0.4 )
chroms <- sapply(strsplit(apple.LGs$marnames,split="-",fixed=T),"[",1)
chr_vs_LG <- data.frame(Marker = apple.LGs$marnames, LG = apple.LGs$groups, Chr = chroms)
chr_vs_LG <- chr_vs_LG[chr_vs_LG$LG<=17,]
tab <- table(chr_vs_LG$Chr,chr_vs_LG$LG)
tab[,order(apply(tab,2,which.max))]
##     
##      13  1  6 14 15 16 17  9  0  2 11 12 10  3  4  7  5  8
##   1  30  0  1  0  0  0  1  1  2  3  1  0  0  1  0  0  3  0
##   10  0 36  1  0  0  0  0  1  2  1  0  0  1  1  5  1  1  3
##   11  1  0 49  0  0  0  0  1  4  0  1  1  2  0  1  1  3  0
##   12  0  1  0 34  1  3  0  0  1  0  0  0  0  0  0  0  1  0
##   13  1  1  0  3  0  0 34  1  1  1  8  0  2  0  0  0  0  1
##   14  3  3  0  2  0  0  0 34  2  1  0  0  0  0  3  1  0  1
##   15  3  1  0  0  0  0  0  0  5 35  0  1  0  3  2  3  1  0
##   16  2  4  0  0  0  0  0  0  0  0 22  0  0  1  0  0  0  0
##   17  0  0  0  0  0  0  0  1  0  1  2  0  0  0  0  0  0  0
##   2   0  2  0  0  0  0  0  1  3  2  0 41  0  1  1  0  1  4
##   3   0  0  2  1  0  0  1  0  1  0  0  0 42  3  3  0  0  0
##   4   0  0  0  1  0  0  2  0  1  0  2  0  0 26  0  0  0  0
##   5   0  2  0  0  0  0  1  0  2  0  1  1  0  1 26  0  0  0
##   6   2  0  0  0  0  0  0  0  1  0  1  0  0  0  3 21  0  0
##   7   0  0  0  0  0  0  2  0  0  0  0  0  0  0  1  0 32  1
##   8   2  1  0  4  1  0  0  0  1  2  0  1  1  0  5  0  0 40
##   9   0  2  0  0  0  0  1  0  1  0  1  0  1  1  2  0  0  2

However, LGs 15 and 16 now contain very few markers. These LGs likely correspond to chromosomes 17 and 9, which indeed contain very few markers in the 17 largest linkage groups:

table(chr_vs_LG$Chr)
## 
##  1 10 11 12 13 14 15 16 17  2  3  4  5  6  7  8  9 
## 43 53 64 41 53 50 54 29  4 56 53 32 34 28 36 58 11

These chromosomes consisted of 51 and 83 markers in the filtered VCF file:

allMarkers <- sapply(strsplit(rownames(apple.twopts$analysis[[1]]),split="-",fixed=T),"[",1)
table(allMarkers)
## allMarkers
##   1  10  11  12  13  14  15  16  17   2   3   4   5   6   7   8   9 
##  68  87  94  71  90  69  92  48  51 100  75  58  77  53  60  85  83

This does not appear unusual for this dataset (where chromosomes on average contain only 75 markers), it is possible that all of the markers on these chromosomes may be relatively far apart, and located at the ends of the two arms of these two chromosomes. In any event, constructing maps for these two linkage groups would require a higher marker density.