Chapter 22 Visualizing phylogenetic networks
We will be working with both the DensiTree application from BEAST and the phangorn
package in R.
22.1 Using DensiTree
When you ran your Bayesian analysis, you created a distribution of trees. You summarized this distribution when you looked at the posterior estimates of support for the nodes and clades, but there is a lot more information we can get from this distribution.
Open DensiTree on your personal computer. (DensiTree should have been installed when you downloaded BEAST.) Click on File, then Load and choose your .trees file that was created when you ran BEAST. All the trees in the distribution will be drawn in a fuzzy-looking tree. The darker the line, the more consensus there is among your tree distribution that the branch exists.
In the grass example, we see branches in three separate colors - blue, orange, and green. (I changed the default colors by clicking on “Line Color” in the menu on the right side of the screen. If you open it and then open the “Line Colors” option, you can change your colors as well.) The blue branches are branches supported by the majority of the tree distribution, while the orange and green branches represent alternate branching patterns.
The orange and green colors are very light because those alternate branching patterns show up in only a small percentage of the tree distribution. We can see the percentage if we click on “Help” and then “View Clades.”
There’s not much disagreement among the trees in the posterior distribution for our grass example. The main disagreement is with the placement of the Asiatic grass sample; 85% of the trees place Asiatic grass with the crested wheatgrass/medusahead rye/mosquito grass clade, which 15% place it in a different position.
I have been using the words “probability” and “percentage” interchangeably when talking about the Bayesian posterior tree distribution. Although probability and percentage aren’t usually the same thing, in this case they are. We are estimating the probability of topologies by calculating the percentage of the tree distribution with the desired topology.
22.2 Networks in R
phangorn
offers an algorithm called consensusNet
to visualize conflicting topological relationships from our data. This algorithm builds networks similar to the Huson example above using previously-generated analyses files.
We have two options for our input. consensusNet
takes a list of trees as the input (which R sees an an object of class multiPhylo
), so we can either load the bootstrap trees from the ML analysis or the posterior distribution of trees from the Bayesian analysis. Here we will do both with the grass dataset so we can compare the networks. (Notice that we need to use two different commands to load the trees into R - the two tree files are saved in two separate formats. The ML trees are saved in Newick format, while the Bayesian trees are saved in Nexus format.)
library(phangorn)
<- read.tree("grass_ml_bootstrap.tre")
grass.ml <- read.nexus("grass_bayes.trees") grass.bayes
After we load our trees, we run the consensusNet
command. This command takes two arguments. The first is the tree file, while the second is the bootstrap value threshold. For this example, we’ll set the threshold at 0.1, meaning we will see all the possible nodes with at least a bootstrap support or posterior density of 0.1.
<- consensusNet(grass.ml, .1)
cnet.ml <- consensusNet(grass.bayes, .1)
cnet.bayes
plot(cnet.ml, show.edge.label=TRUE)
We can see there are very few alternate topologies in the bootstrap trees (which we already knew would be the case, given the high bootstrap support in the original ML tree!), but there are a couple of splits. The first suggests that crested wheatgrass is sometimes in a polytomy with medusahead rye and mosquito grass, instead of splitting from that clade earlier. The second split suggests that some trees has Asiatic grass splitting from the main tree as the basal node and sometimes clusters with the crested wheatgrass/medusahead rye/mosquito grass clade. Finally, the last split changes the position of rye.
plot(cnet.bayes, show.edge.label=TRUE)
There is only one split in the Bayesian network - the placement of Asiatic grass taxon. We have seen this already in the DensiTree diagram, but it does look a bit different in network form. This particular split shows up in both the ML bootstrap network and the Bayesian posterior distribution network.
We can also add the support values for each split (the bootstrap values for the ML network, or the posterior probabilities for the Bayesian network) by changing the show.edge.label
command to equal TRUE. Be careful, though, as this can make the network very difficult to read.
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] phangorn_2.5.5 ape_5.4-1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.10 bslib_0.4.2 compiler_4.0.2 pillar_1.9.0
## [5] jquerylib_0.1.4 tools_4.0.2 digest_0.6.25 jsonlite_1.7.1
## [9] evaluate_0.20 lifecycle_1.0.3 tibble_3.2.1 nlme_3.1-149
## [13] lattice_0.20-41 pkgconfig_2.0.3 rlang_1.1.0 igraph_1.2.6
## [17] fastmatch_1.1-0 Matrix_1.2-18 cli_3.6.1 yaml_2.2.1
## [21] parallel_4.0.2 xfun_0.26 fastmap_1.1.1 stringr_1.4.0
## [25] knitr_1.33 fs_1.5.0 vctrs_0.6.1 sass_0.4.5
## [29] hms_0.5.3 grid_4.0.2 glue_1.4.2 R6_2.4.1
## [33] fansi_0.4.1 ottrpal_1.0.1 rmarkdown_2.10 bookdown_0.24
## [37] readr_1.4.0 magrittr_2.0.3 htmltools_0.5.5 quadprog_1.5-8
## [41] utf8_1.1.4 stringi_1.5.3 cachem_1.0.7