Colin J. Curry, Joel F. Gibson, Shadi Shokralla, Mehrdad Hajibabaei, and Donald J. Baird
Abstract: We reviewed the availability of cytochrome c oxidase subunit I (COI) sequences for 2534 North American freshwater invertebrate genera in public databases (GenBank and Barcode of Life Data Systems) and assessed representation of genera commonly encountered in the Canadian Aquatic Biomonitoring Network (CABIN) database. COI sequence records were available for 61.2% of North American genera and 72.4% of Insecta genera in public databases. Mollusca (73.9%) and Nematoda (15.4%) were the best and worst represented groups, respectively. In CABIN, 85.4% of genera had COI sequence records, and 95.2% of genera occurring in >1% of samples were represented. Genera absent from CABIN tended to be uncommon or members of groups not routinely used for biomonitoring purposes. On average, 94.1% of genera in well-identified samples had associated sequence data. To leverage the full potential of genomics approaches, we must expand DNA-barcode reference libraries for poorly described components of freshwater food webs. Some genera appear to be well represented (e.g., Eukiefferiella), but deposited sequences represent few sampling localities or few species and lead to underestimation of sequence diversity at the genus level and reduced confidence in identifications. Public COI libraries are sufficiently populated to permit routine application of genomics tools in biomonitoring, and ongoing quality assurance/quality control should include re-evaluation as new COI reference sequences are added or taxonomic hierarchies change. Next, we must understand whether and how established biomonitoring approaches can capitalize on high-throughput sequencing tools. Biomonitoring approaches that use genomics data to facilitate structural and functional assessments are fertile ground for future investigation and will benefit from continued improvement of publicly available sequence libraries.
Key words: COI, invertebrates, biomonitoring, high-throughput sequencing, DNA metabarcoding, identification,
genus, Biomonitoring 2.0