Data, time & money: what is the best strategy to resolve the molecular phylogeny of non-model organisms?
Paul Zaharias, Mark Phuong, Nicolas Puillandre
Increasing the number of taxa and characters can affect the robustness of phylogenetic inferences. With the advent of phylogenomics, transcriptomes and (reduced) genomes are now widely used, but sequencing, assembling and comparing them can be expensive, time consuming and complex for non-model organisms. Our goal was to identify the strategy that would represent the best compromise between costs, time and robustness of the resulting tree. We sampled 32 transcriptomes of marine molluscs of the family Turridae. From these data, we extracted the most commonly used genes in gastropod phylogenies (COX1, 12S, 16S, 28S, H3 & 18S), full mitogenomes, and reduced exome. With each dataset, we reconstructed phylogenies and compared their robustness and accuracy. We evaluated the impact of missing data, the use of supertree and supermatrix methods and the cost (time and money) in order to identify the best compromise for phylogenetic data sampling in non-model organisms.