This vignette summarizes the workflow for simulating a host and symbiont tree using the
This model has a number of parameters that are input into the
sim_cophylo_bdp function. Let us assume that we just want to simulate to a time of 2.0 units. We can then use
calculate_expected_leaves_locustree to figure out the average number of extant tips that will be on our host and symbiont trees respectively.
library(treeducken) #> Loading required package: ape set.seed(42) calculate_expected_leaves_sptree(1.0, 0.5, 2.0) exp_tips_host <-# on average we will get about 5 tips # maybe we want something more like 8, so let's decrease the extinction rate calculate_expected_leaves_sptree(1.0, 0.3, 2.0) exp_tips_host <-# that looks about right, let's assume there are no host speciations that are # not cospeciations 0.0 h_lambda <- 0.3 h_mu <- 1.0 c_lambda <-# now we need to worry about the symbiont tree since only cospeciations are # occurring # we can use the same math as for the locus tree simulator calculate_expected_leaves_locustree(t = 2.0, exp_tips_symb <-dup_rate = 0.2, loss_rate = 0.1, 8) # a little less than 10 symbiont tips on average 0.2 s_lambda <- 0.1 s_mu <-# let's assume that when symbionts speciate that they always inherit their # ancestor's host repertoire 0.0s_her <-
Now we have all the necessary parameters set with some idea of what they mean.
We will now use the main function,
sim_cophylo_bdp with the parameters we set in the previous section. Let’s simulate ten sets and then we can print out a summary for each set.
# simulate 10 paired trees with our set parameters. sim_cophylo_bdp(hbr = h_lambda, host_symb_sets <-hdr = h_mu, sbr = s_lambda, cosp_rate = c_lambda, sdr = s_mu, host_exp_rate = s_her, time_to_sim = 2.0, numbsim = 10) print(host_symb_sets, details = TRUE) #> 10 cophylogenetic sets. #> Cophylogenetic set 1 : #> Symbiont tree has 20 tips. #> Host tree has 18 tips. #> Cophylogenetic set 2 : #> Symbiont tree has 16 tips. #> Host tree has 14 tips. #> Cophylogenetic set 3 : #> Symbiont tree has 3 tips. #> Host tree has 3 tips. #> Cophylogenetic set 4 : #> Symbiont tree has 15 tips. #> Host tree has 12 tips. #> Cophylogenetic set 5 : #> Symbiont tree has 8 tips. #> Host tree has 6 tips. #> Cophylogenetic set 6 : #> Symbiont tree has 2 tips. #> Host tree has 2 tips. #> Cophylogenetic set 7 : #> Symbiont tree has 4 tips. #> Host tree has 3 tips. #> Cophylogenetic set 8 : #> Symbiont tree has 14 tips. #> Host tree has 11 tips. #> Cophylogenetic set 9 : #> Symbiont tree has 5 tips. #> Host tree has 4 tips. #> Cophylogenetic set 10 : #> Symbiont tree has 4 tips. #> Host tree has 4 tips.
Notice that some of these are not particularly interesting with only 2 host or symbiont tips. This is expected behavior, and occurs since there is a nonzero probability that no events occur in the time we simulated.
Let’s check out one set.
# # plot one or two of them host_symb_sets[] tree_set_of_interest <-print(tree_set_of_interest) #> #> Host Tree: #> #> #> Phylogenetic tree with 18 tips and 17 internal nodes. #> #> Tip labels: #> H1, H2, X3, H4, H5, X6, ... #> #> Rooted; includes branch lengths. #> #> #> Symbiont Tree: #> #> #> Phylogenetic tree with 20 tips and 19 internal nodes. #> #> Tip labels: #> X1, X2, S3, X4, X5, S6, ... #> #> Rooted; includes branch lengths. #> #> Association Matrix: #> #> #> There are 15 rows (i.e. extant symbionts). #> There are 14 cols (i.e. extant hosts).
plot(tree_set_of_interest, col = "red", lty = "dotted")
Notice that the output of
sim_cophylo_bdp is a list of
cophy objects. Each
cophy object is composed of 4 elements:
symb_tree are both of the
phylo object (from the ape package). The
association_mat which is a matrix with number of rows equal to the number of extant tips in the symbiont tree, and number of columns equal to the number of extant tips in the host tree. The
event_history is a data frame which shows the full history of the simulation. It contains which events occur on which symbiont and/or host and at which time. The following event code is used: “C”- cospeciation, “HG” - host gain (a host speciation), “HL” - host less (host extinction), “SG” - symbiont gain (symbiont speciation), “SL” - symbiont loss (symbiont extinction), “AG” - association gain, and “AL” - association loss. This bit is a mess at present so please let me know if you think of ways I could trim this down.
We can also perform the parafit test using the results.
tree_set_of_interest$host_tree host_tree <- tree_set_of_interest$symb_tree symb_tree <- tree_set_of_interest$association_mat a <- parafit_stat(host_tr = host_tree, symb_tr = symb_tree, assoc_mat = a) d <-parafit_test(host_tr = host_tree, symb_tr = symb_tree, assoc_mat = a, D = d, reps = 99) #>  0.01
We can also do a full summary of the results with the following function. This tabulates the number of each event in the
event_history data frame, and then performs the Parafit test of Legendre et al. 2004.
# of course that bit is maybe a little too verbose # we may be interested in a lot of trees # we can instead use cophylo_summary_stat cophy_summary_stat(host_symb_sets) host_symb_summary_df <-#> Calculating for replicate 1 #> Calculating for replicate 2 #> Calculating for replicate 3 #> Calculating for replicate 4 #> Calculating for replicate 5 #> Calculating for replicate 6 #> Warning in treeducken::parafit_stat(cophy_obj[[cophy_obj_indx]]$host_tree, : 'host_tr' must be a tree with more than 2 extant tips to calculate the parafit stat returning NA. #> Calculating for replicate 7 #> Warning in treeducken::parafit_stat(cophy_obj[[cophy_obj_indx]]$host_tree, : 'host_tr' must be a tree with more than 2 extant tips to calculate the parafit stat returning NA. #> Calculating for replicate 8 #> Calculating for replicate 9 #> Calculating for replicate 10 host_symb_summary_df#> Cospeciations Host_Speciations Host_Extinctions Symbiont_Speciations #> 1 17 NA 4 2 #> 2 13 NA NA 2 #> 3 2 NA NA NA #> 4 11 NA 3 3 #> 5 5 NA NA 2 #> 6 1 NA NA NA #> 7 2 NA 1 1 #> 8 10 NA NA 3 #> 9 3 NA NA 1 #> 10 3 NA 1 NA #> Symbiont_Extinctions Parafit_Stat Parafit_P-value #> 1 5 1467.6208709 0.01 #> 2 1 844.7867082 0.01 #> 3 NA 0.2159458 0.06 #> 4 4 431.3475091 0.01 #> 5 NA 78.3393946 0.01 #> 6 NA NA NA #> 7 1 NA NA #> 8 1 650.1323319 0.01 #> 9 1 121.7864797 0.03 #> 10 1 3.7077994 0.04
To model gene tree-species tree discordance on these trees we use the three-tree model (Rasmussen and Kellis 2012). This model has three levels: species trees, locus trees, and gene trees. The species tree models speciation and extinction, the locus tree models gene birth, death and transfers, and the gene tree models incomplete lineage sorting. Transfers can occur randomly to any recipient throughout a tree or to species that are more closely related to the transfer donor species.
We can use either symbiont or host trees to simulate locus trees. First we will use the host tree:
sim_locustree_bdp(host_tree, host_tree_locus_trees <-gbr = 0.4, gdr = 0.2, lgtr = 0.0, num_loci = 10) host_tree_locus_trees#> 10 phylogenetic trees
Now we will use the symbiont tree:
sim_locustree_bdp(symb_tree, symb_tree_locus_trees <-gbr = 0.2, gdr = 0.1, lgtr = 0.1, num_loci = 10) str(symb_tree_locus_trees[]) #> List of 6 #> $ edge : num [1:38, 1:2] 21 21 23 23 25 25 27 27 26 26 ... #> $ edge.length: num [1:38] 0.391 0.5276 0.9907 0.1069 0.0804 ... #> $ Nnode : int 19 #> $ tip.label : chr [1:20] "S20_1" "S19_1" "X4_1" "S14_1" ... #> $ root.edge : num 0.0557 #> $ node.label : chr [1:19] "" "" "" "" ... #> - attr(*, "class")= chr "phylo"
Using these locus trees we can simulate under the multi-locus coalescent process (see Rasmussen and Kellis 2012). We can specify an effective population size and a generation time measured in generations per unit time. For now we will assume that the time of our trees is in units of millions of years. The default for generation time is then 1 time per year (1e-6). And we will randomly draw an effective population size from a Lognormal with mean 14 and standard deviation 0.4. Note that the only reason these values chosen is to mirror the validation test of Mallo et al. (2015) for the SimPhy program.
host_tree_locus_trees[] host_locus_tree <-# randomly choose an effective popsize 1e6 popsize <- sim_multilocus_coal(host_locus_tree, host_loci_gene_trees <-effective_pop_size = popsize, num_reps = 100)
And we can calculate some tree summary statistics on all of genes that were simulated on one of the locus trees (in this case the first one).
You can also combine a host tree, symbiont tree, and association matrix into a
cophy object that can be input into functions in
treeducken. The example below reads in the classic gopher and lice dataset from Hafner et al. 1994. The more common association table format with two columns with hosts in the first and symbionts (or parasites) in the second column is converted into an association matrix. These are all then converted into
cophy class. You can then print out a summary of these and then calculate summary statistics on these. Currently, this only calculates the parafit statistic and conducts the permutation test for that statistic if the
event_history element of
cophy is not set. If that is set it will count the numbers of each event.
read.table(system.file("extdata", gopher_lice_map <-"gopher_lice_mapping.txt", package = "treeducken"), stringsAsFactors = FALSE, header = TRUE) convert_assoc_table_to_matrix(gopher_lice_map) gopher_lice_assoc_matrix <- ape::read.nexus(system.file("extdata", gopher_tree <-"gophers_bd.tre", package = "treeducken")) ape::read.nexus(system.file("extdata", lice_tree <-"lice_bd.tre", package = "treeducken")) convert_to_cophy(hostTree = gopher_tree, gopher_lice_cophylo <-symbTree = lice_tree, assocMat = gopher_lice_assoc_matrix) print(gopher_lice_cophylo) #> #> Host Tree: #> #> #> Phylogenetic tree with 15 tips and 14 internal nodes. #> #> Tip labels: #> Cratogeomys_castanops_L32685, Cratogeomys_merriami_L32688, Geomys_breviceps_L32683, Geomys_bursarius_L32693, Geomys_bursarius_L32694, Geomys_personatus_L32689, ... #> #> Rooted; includes branch lengths. #> #> #> Symbiont Tree: #> #> #> Phylogenetic tree with 17 tips and 16 internal nodes. #> #> Tip labels: #> Geomydoecus_actuosi_L32665, Geomydoecus_chapini_L32667, Geomydoecus_cherriei_L32668, Geomydoecus_costaricensis_L32669, Geomydoecus_ewingi_L32671, Geomydoecus_expansus_L32670, ... #> #> Rooted; includes branch lengths. #> #> Association Matrix: #> #> #> There are 17 rows (i.e. extant symbionts). #> There are 15 cols (i.e. extant hosts). cophy_summary_stat(gopher_lice_cophylo) #> Cospeciations Host_Speciations Host_Extinctions Symbiont_Speciations #> 1 0 0 0 0 #> Symbiont_Extinctions Parafit_Stat Parafit_P-value #> 1 0 2.720485 0.1 plot(gopher_lice_cophylo, fsize = 0.5, show_tip_label = FALSE, gap = 1, col = "purple", lty = "dashed")