Benchmarking Read-Based Virome Profilers for Human Virus Detection and Community Discovery

Na Rae Choi; Jung Hwa Park; Tae Sung Kim; Hee Sam Na

doi:10.4167/jbv.2025.55.4.350

Preview

Original Article

JOURNAL OF BACTERIOLOGY AND VIROLOGY. 31 December 2025. 350-359
https://doi.org/10.4167/jbv.2025.55.4.350

Benchmarking Read-Based Virome Profilers for Human Virus Detection and Community Discovery

Na Rae Choi¹

Jung Hwa Park²

Tae Sung Kim²

Hee Sam Na²^{^*}

¹Department of Oral and Maxilofacial Surgery, Pusan National University, Yangsan 50612, Republic of Korea

²Department of Oral Microbiology, School of Dentistry, Pusan National University, Yangsan 50612, Republic of Korea

^*heesamy@pusan.ac.kr

License (open-access, https://creativecommons.org/licenses/by-nc/4.0/):

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/).

ABSTRACT

Background: Shotgun metagenomics enables human virome profiling, yet read- based platforms differ in databases and mapping strategies, potentially altering biological conclusions, especially when goals diverge between human virus detection and broader virome discovery. Methods: We benchmarked Kraken2, FastViromeExplorer, and ViromeScan on whole-genome shotgun datasets from gut, nasal, oral, and vaginal sites. Reads were trimmed, human-depleted, and profiled; counts were normalized to CPM. We compared alpha diversity, beta diversity, and taxonomic composition. Results: Platform choice strongly affected viral composition. FastViromeExplorer produced the highest CPM and alpha diversity. Kraken2 was intermediate, and ViromeScan was lowest. Beta-diversity showed clear site clustering for all platforms, strongest with FastViromeExplorer, moderate with Kraken2, and weakest with ViromeScan. At the phylum level, Uroviricota dominated Kraken2 and FastViromeExplorer, whereas ViromeScan emphasized Nucleocytoviricota and Peploviricota. Genus and species calls were highly platform-dependent. FastViromeExplorer detected the most taxa overall, while ViromeScan preferentially reported eukaryotic and human viruses. Conclusions: Read-based profilers yield divergent virome portraits driven by database scope and mapping stringency. For clinical human-virus detection, curated human and eukaryotic references with coverage-based confirmation are essential. Our results provide practical guidance for aligning pipelines to study goals and underscore the need to report parameters, database versions, and cross-platform validations when interpreting the human virome.

Keywords

Virome profile

Benchmark

Next generation sequencing

Whole genome sequencing

MAIN

INTRODUCTION

Viruses are a pervasive component of the human holobiont; even asymptomatic hosts harbor rich viral communities whose interactions do not always end with the death of the virus-infected cells (1, 2, 3). Recognizing the virome as a significant facet of human biology has sharpened interest in profiling its taxonomic and phylogenetic structure to elucidate roles in health and complex disease and to inform new diagnostic and therapeutic avenues (2, 4). Classical virus discovery via isolation and culture is slow and often infeasible because many viruses are recalcitrant to cultivation (5). Unlike bacteria, viral communities are also hard to characterize since there is no common gene for all viral genomes, preventing the application of ribosomal DNA profiling commonly used for bacteria studies. Consequently, unbiased shotgun metagenomics has become the primary route for virome characterization, enabled by next-generation sequencing that delivers large data volumes at reduced cost (6).

Despite its promise, routine clinical adoption is limited by the need for standardized, validated wet-lab procedures and accreditation-compatible quality systems (7). A second barrier is computational issues. Billions of short reads must be assigned to highly diverse, rapidly evolving viral genomes. There are two major approaches to address computational challenge. First, assembly-based pipelines reconstruct contigs and then annotate them to reference databases (8, 9, 10). Constructing contigs yields longer coding regions that improve annotation and host/context inference at the cost of time, memory, and potential chimeras (11). Second approach is read-based profilers mapping which reads directly to reference genome databases, trading contiguity for speed and scalability (12, 13, 14).

Given these contrasting strategies and their distinct assumptions, benchmarking is essential to quantify how platform choice shapes virome inferences across body sites. In this study, we benchmarked three widely used read-based profilers - Kraken2, FastViromeExplorer, and ViromeScan - on whole-genome shotgun (WGS) datasets from human microbiome project (15). We compared read retention and contamination filtering, alpha and beta diversity, and taxonomic composition. We also summarized statistical separations between sites providing practical guidance for tool selection in human virome studies.

MATERIAL AND METHODS

Data retrieval and processing

Human microbiome project (PRJNA275349) data were downloaded from the European Nucleotide Archive database. Paired-end libraries were quality trimmed with Trimmomatic v0.36 and trimmed reads were mapped to the human genome (hg38) with Bowtie v2.5.4 to remove human derived sequences. Taxonomic profiling was conducted using ViromeScan (13), Kraken2 v2.1.3 (12) and FastViromeExplorer (14) to generate relative abundances of viral species identified in each sample.

Microbiome analysis was conducted by phyloseq and related packages in R software v4.3.1. To measure alpha diversities, Chao1 index and Shannon’s index method were used. Principal coordinate analysis (PCoA) of the Bray-Curtis distance was performed to determine the community structure using the vegan package. The Kruskal-Wallis test and the non-parametric permutation multivariate analysis of variance (PERMANOVA) tests were used to assess the statistical significances for alpha and beta diversities, respectively. To test differential abundance of viral species among groups, linear discriminant analysis effect size (LEfSe) (16) was applied with default settings.

RESULTS

Read counts during Preprocessing

Raw WGS reads from four body sites were quality/adapter-trimmed with Trimmomatic and then mapped to the human genome with Bowtie to remove host reads. Trimming retained 86.0 ± 10.3% of gut reads, 11.5 ± 12.3% of nasal reads, 52.5 ± 25.0% of oral reads, and 7.1 ± 2.4% of vaginal reads. After human decontamination, the proportion of reads retained (Bowtie/Trim) was 100.0% (gut), 67.7 ± 19.4% (nasal), 95.0 ± 8.7% (oral), and 52.7 ± 17.9% (vaginal) (Table 1). These results indicate minimal human contamination in gut, moderate human contamination in oral datasets, and substantially higher host content in nasal and vaginal datasets.

Table 1.

Summary of data preprocessing

Site (sample number)	Input reads			Trimmed reads			Unmapped reads
Gut (n=40)	1.86 x 10⁷	±	6.73 x 10⁶	1.60 x 10⁷	±	5.92 x 10⁶	1.60 x 10⁷	±	5.92 x 10⁶
Nasal (n=16)	1.89 x 10⁷	±	7.77 x 10⁶	2.18 x 10⁶	±	1.48 x 10⁶	1.55 x 10⁶	±	1.37 x 10⁶
Oral (n=42)	2.17 x 10⁷	±	6.55 x 10⁶	1.14 x 10⁷	±	6.73 x 10⁶	1.12 x 10⁷	±	6.85 x 10⁶
Viginal (n=16)	2.35 x 10⁷	±	6.21 x 10⁶	1.68 x 10⁶	±	7.39 x 10⁵	8.71 x 10⁵	±	4.78 x 10⁵

Unmapped reads were profiled for viral content using each platform and normalized to counts-per-million (CPM). Across sites, FastViromeExplorer generally reported the highest viral CPM, ViromeScan the lowest, and Kraken2 was intermediate. Overall, FastViromeExplorer yielded substantially higher viral signal in nasal, oral, and vaginal samples, while ViromeScan reported markedly lower CPM in all sites. Gut results showed no difference between Kraken2 and FastViromeExplorer but both exceeded ViromeScan (Table 2). Thus, choice of the platform strongly affected total viral profiles.

Table 2.

Virus counts depending on platforms

Site (sample number)	Kraken2			FastViromeExplorer			ViromeScan
Gut (n=40)	3.79 x 10⁵	±	8.43 x 10⁵	3.29 x 10⁵	±	6.17 x 10⁵	2.37 x 10²	±	2.03 x 10²
Nasal (n=16)	1.31 x 10⁴	±	1.03 x 10⁴	5.25 x 10⁴	±	3.97 x 10⁴	2.11 x 10³	±	1.52 x 10³
Oral (n=42)	5.07 x 10³	±	5.87 x 10³	5.00 x 10⁴	±	4.54 x 10⁴	7.48 x 10²	±	4.39 x 10²
Viginal (n=16)	7.94 x 10³	±	8.72 x 10³	3.02 x 10⁴	±	1.89 x 10⁴	1.77 x 10³	±	1.42 x 10³

Microbial Diversity Differences depending on Platforms

To evaluate differences in viral diversity between platforms, the alpha diversity was assessed using the Chao1 and Shannon indices. Within-sample viral diversity differed markedly depending on platforms. FastViromeExplorer yielded the highest Chao1 and Shannon indices in most sites (nasal, oral, vaginal), Kraken2 was intermediate, and ViromeScan was consistently lowest (Fig. 1). In gut, Kraken2 and FastViromeExplorer were comparable, both exceeding ViromeScan. Across sites, Chao1 patterns varied by platforms. Vaginal samples had the lowest richness with Kraken2 and FastViromeExplorer, whereas the gut was lowest with ViromeScan. Nasal had the highest Chao1 in all platforms. Overall, platforms strongly influenced alpha diversity estimates, mirroring CPM trends.

https://cdn.apub.kr/journalsite/sites/jbv/2025-055-04/N0290550406/images/JBV_2025_v55n4_350_f001.jpg

Fig. 1

Alpha diversity of sampling site depending on platforms. (A) Kraken2, (B) FastViromeExplorer, and (C) ViromeScan. Alpha diversity was used to describe the microbial richness and evenness within samples using the Chao1 and Shannon index.

Beta diversity was analyzed depending on platforms to determine the community composition. Ordinations revealed clear site-wise clustering with FastViromeExplorer showing the strongest separation among sites, Kraken2 showing moderate separation, and ViromeScan the weakest, partially overlapping clusters. Kraken2 performed all pairs significant after adjustment (Fig. 2). FastViromeExplorer performed all pairs significant except nasal–vaginal which was borderline. ViromeScan performed all pairs significant except nasal–vaginal. Effects comparable with the largest for gut–vaginal. Overall, site explains a substantial share of community differences, but the magnitude of separation depended on the platforms, while nasal–vaginal showed consistently the weakest contrast.

https://cdn.apub.kr/journalsite/sites/jbv/2025-055-04/N0290550406/images/JBV_2025_v55n4_350_f002.jpg

Fig. 2

Beta diversity of sampling sites depending on platforms. (A) Kraken2, (B) FastViromeExplorer, and (C) ViromeScan. Principal coordinate analysis (PCoA) of the Bray-Curtis distance was performed to determine the viral community structure.

Taxonomic Composition at the Phylum Level Across Platforms

With Kraken2, most sites were dominated by Uroviricota. With FastViromeExplorer, Uroviricota was also dominant in nasal and vaginal, while oral and gut showed notable contributions from Lenarviricota, Negarnaviricota, and Cossaviricota. With ViromeScan, Uroviricota was not profiled, and communities were led by Nucleocytoviricota, followed by Peploviricota, Artverviricota, Cossaviricota, and Negarnaviricota. By site, gut and oral profiles were concordant between Kraken2 and FastViromeExplorer, whereas nasal and vaginal showed greater divergence (Fig. 3). Overall, platform choice materially changes which phyla appear among the phyla and their relative weights.

https://cdn.apub.kr/journalsite/sites/jbv/2025-055-04/N0290550406/images/JBV_2025_v55n4_350_f003.jpg

Fig. 3

Relative abundance of Phyla depending on platforms. (A) Kraken2, (B) FastViromeExplorer, and (C) ViromeScan. Relative abundance of top 10 most abundant phyla were plotted.

Taxonomic Composition at the Genus Level Across Platforms

At the genus-level, a total of 481 genera were identified through Kraken2, 855 genera through FastViromeExplorer and 108 genera through ViromeScan. Among these, 321 genera were commonly detected between Kraken2 and FastViromeExplorer, while 92 genera were commonly found between FastViromeExplorer and ViromeScan and 63 genera between Kraken2 and ViromeScan.

With Kraken2, Peduovirus and Jouyvirus were commonly found to be one of the most abundant across all sites with several site-distinctive genera. In the gut, Carjivirus was most abundant, followed by Peduovirus, Jouyvirus, Afonbuvirus, and Birpovirus. In nasal, Pahexavirus was most abundant. In oral samples, Peduovirus dominated alongside an unclassified Caudoviricetes genus and Jouyvirus. In vaginal samples, Peduovirus and Jouyvirus were dominant, with unclassified Caudoviricetes, Blohavirus and Birpovirus (Fig. 4A).

With FastViromeExplorer, Caudoviricetes-associated genera including Quadragintavirus, Peduovirus, and Jouyvirus were detected across all sites. In gut, top 5 most abundant genera were led by Chaphamaparvovirus, Carjivirus, Quadragintavirus, Citricivirus, and Jouyvirus. In nasal samples, Quadragintavirus and Jouyvirus were most abundant followed by Evevirus, Peduovirus, and Moineauvirus. In oral samples, Quadragintavirus and Citricivirus were most abundant, followed by Jouyvirus and Evevirus. In vaginal samples, the profile resembled nasal and oral samples, with Quadragintavirus and Jouyvirus most abundant, followed by Evevirus, Moineauvirus, and Peduovirus (Fig. 4B).

With ViromeScan, giant-virus including Pandoravirus, Orthopoxvirus, and Megavirus and parasitoid-associated groups including Ichnoviriform and Bracoviriform were frequently noted. In gut samples, the top 5 genera were Orthopoxvirus, Pandoravirus, Whispovirus, Megavirus, and Cyvirus. Pandoravirus and Cyvirus were commonly most abundant genera in nasal, oral, and vaginal samples. In nasal samples, Ichnoviriform, unclassified Retroviridae, and Bracoviriform were also abundant. In oral samples, Chlorovirus, Alphabaculovirus, and Ichnoviriform were also abundant. In vaginal samples, Ichnoviriform, unclassified Retroviridae, and Aurivirus were also abundant (Fig. 4C). These contrasts underscored strong platform effects on genus-level calls and suggest that conclusions about site-enriched genera are method-dependent; when biologically critical, key taxa should be validated across tools and reference databases.

https://cdn.apub.kr/journalsite/sites/jbv/2025-055-04/N0290550406/images/JBV_2025_v55n4_350_f004.jpg

Fig. 4

Relative abundance of Genus depending on platforms. (A) Kraken2, (B) FastViromeExplorer, and (C) ViromeScan. Relative abundance of top 15 most abundant genus were plotted.

Taxonomic Composition at the Species Level Across Platforms

At the species level, Kraken2 detected 982 species, FastViromeExplorer detected 2,272 species, and ViromeScan detected 254 species. Pairwise overlaps were limited; 1 species was common between Kraken2 and FastViromeExplorer, 182 species were common between FastViromeExplorer and ViromeScan, and there were no common species between Kraken2 and ViromeScan.

With Kraken2, Peduovirus P22H1 and Jouyvirus ev207 were recurrently most abundant across sites. In gut samples, Carjivirus communis was most abundant followed by Peduovirus P22H1, Jouyvirus ev207, Carjivirus hominis, and Junduvirus communis. In nasal samples, Pahexavirus P105, PHL082M03, PHL041M10 were notable nasal-enriched species. In oral samples, Streptococcus phage PH10, Streptococcus phage SpSL1, and Rothia phage Spartoi were also prominent. In vaginal samples, Lactobacillus phage Lv-1, Blohavirus americanus, and Blohavirus faecalis were also among the top 5 species (Fig. 5A).

With FastViromeExplorer, Caudoviricetes-linked species including Quadragintavirus, Peduovirus, Evevirus and Jouyvirus were widespread with site-specific highlights. In gut samples, Chaphamaparvovirus galliform3, and Quadragintavirus ev129, were also in the list. In oral samples, Citricivirus chongqinense, and Gihfavirus pelohabitans were also noted (Fig. 5B).

With ViromeScan, giant/parasitoid-virus groups were dominant species. In gut samples, Orthopoxvirus, Pandoravirus salinus/dulcis, Whispovirus, Megavirus chilense, and Cyvirus cyprinidallo1 were top 5 species. In nasal, oral and vaginal samples, Pandoravirus dulcis, Cyvirus cyprinidallo1, and Human endogenous retrovirus K were among the top 5 abundant species (Fig. 5C). Taken together, species-level calls are strongly platform-dependent in both coverage (detected set size) and composition (top species by site).

https://cdn.apub.kr/journalsite/sites/jbv/2025-055-04/N0290550406/images/JBV_2025_v55n4_350_f005.jpg

Fig. 5

Relative abundance of Species depending on platforms. (A) Kraken2, (B) FastViromeExplorer, and (C) ViromeScan. Relative abundance of top 15 most abundant species were plotted.

Platform-Dependent Differences in Microbial Profiles

Finally. LEfSe identified platform-specific species biomarkers (Fig. 6). Kraken2 yielded the largest set of significant species, while FastViromeExplorer yielded the fewest. With Kraken2 (Fig. 6A), multiple Streptococcus phages were enriched in oral samples, and several Pahexavirus species were enriched in nasal samples. With FastViromeExplorer (Fig. 6B), Streptococcus phages were again oral-enriched, whereas Lactobacillus phages were characteristic of vaginal samples. With ViromeScan (Fig. 6C), Gammapapillomaviruses were prominent in nasal samples; in oral samples, Varicellovirus, Simplexvirus, and Omegapapillomavirus were significant. In vaginal samples, Alphapapillomavirus, Simplexvirus, Molluscum contagiosum virus, Cytomegalovirus, and Varicellovirus were among the significant species. Taken together, the biomarker set varies strongly by platform, mirroring database coverage and mapping strategy.

https://cdn.apub.kr/journalsite/sites/jbv/2025-055-04/N0290550406/images/JBV_2025_v55n4_350_f006.jpg

Fig. 6

Comparisons of the viral abundance depending on sampling sites showing significant differences. (A) Kraken2, (B) FastViromeExplorer, and (C) ViromeScan. The analysis was performed using linear discriminant analysis (LDA) and effect size analysis (LEfSe). Significant viral species with an LDA score >3.0 at species level were plotted.

DISCUSSION

Human virome is a pervasive, biologically consequential component of the holobiont. However, it remains hard to measure because viruses lack a universal marker and many are uncultivable. Shotgun metagenomics has become the primary route for virome characterization, but clinical translation is constrained by the computational challenge of assigning billions of reads. There are two major analytic approaches; assembly-based pipelines and read-based profilers. Assembly-based pipelines construct contigs that improve annotation and host/context inference. However, there are several challenges and limitations of contig assembly (11). High-abundance taxa assemble well while rare members fragment, biasing downstream diversity and function estimates (17). Large, complex metagenomes require substantial RAM and CPU and results vary with assembler, k-mer choices and versions (18). Mapping back to assembled contigs can shift abundance estimates relative to direct read-profiling, complicating cross-study comparisons (19). Thus, we benchmarked read-based profilers to understand how tool choice alters inferred diversity and composition across body sites.

Across four body sites, read-based profilers produced markedly different viral signals. FastViromeExplorer consistently yielded higher CPM and greater alpha-diversity than Kraken2, with ViromeScan lowest overall. There are several factors for these discrepancies. First, reference database design could be one of the factors. Kraken2 and FastViromeExplorer rely heavily on phage/RefSeq-centric catalogs, whereas ViromeScan emphasizes curated human and eukaryotic-virus genomes and applies hierarchical filters. A database weighted to bacteriophages favors overall virome discovery, while a human-virus-focused catalog prioritizes clinically relevant viruses but may under-call phages and novel lineages. Second, mapping strategy and stringency could be another reason. Kraken2 uses exact k-mer matches with lowest common ancestor (LCA) assignment (12), while FastViromeExplorer uses pseudoalignment (kallisto) plus coverage, ratio, and read filters (14) and ViromeScan uses staged mapping after host and bacterial pre-filters (13). These choices could shift sensitivity–specificity trade-offs. Third, site biology and host carryover could have influenced the profiles. Nasal and vaginal libraries carried more host reads and lower retained fractions after decontamination, amplifying differences among tools. Viral communities are also niche-specific and database emphasis may have interacted with site biology.

Our cross-platform comparison shows that database scope and matching strategy reshape the virome picture. At the phylum level, Kraken2 and FastViromeExplorer yielded Uroviricota dominance in most sites. Uroviricota is a phylum of non-enveloped dsDNA viruses that includes the class Caudoviricetes, which is well known to have a distinct shape. The virion has an icosahedral head that contains the viral genome and is attached to a flexible tail by a connector protein (20). ViromeScan reported communities led by Nucleocytoviricota, followed by Peploviricota, Artverviricota, Cossaviricota, and Negarnaviricota. Nucleocytoviricota is a large dsDNA viruse including family of Poxviridae and Pandoraviridae(21, 22). These viruses are referred to as nucleocytoplasmic since most of the viruses in this family replicate in both the host’s nucleus and cytoplasm. Peploviricota is a dsDNA virus that includes order of Herpesvirales, characterised by a common morphology consisting of an icosahedral capsid enclosed in a glycoprotein-containing lipid envelope. Families of Herpesvirales include Alloherpesviridae, Malacoherpesviridae, and Orthoherpesviridae (23). Cossaviricota is a phylum of viruses, whose named after Yvonne Cossart who discovered Parvovirus B19, the causative pathogen of fifth disease (24). Cossaviricota include class of Mouviricetes, Papovaviricetes, and Quintoviricetes(25).

LEfSe highlighted platform-specific viral biomarkers when each site was contrasted against the others, reinforcing that database scope and mapping strategy shape biological conclusions. Kraken2 produced the largest biomarker set and ViromeScan emphasized eukaryotic viruses, consistent with their reference emphases. In oral samples profiled by Kraken2 and FastViromeExplorer, repeated enrichment of Streptococcus phages aligns with the Streptococcus-rich oral microbiome and frequent oral phage activity. In nasal profiled with Kraken2, Pahexavirus fit a phage-leaning nasal community, while ViromeScan profiling noted Gammapapillomaviruses, plausibly reflecting mucosal epithelia of the anterior nares. In vaginal samples profiled by FastViromeExplorer, Lactobacillus phages were coherent with a Lactobacillus-dominated niche, suggesting active phage–host dynamics.

Taken together, distinct goals may require different analytical strategies. For human virus detection such as clinical or targeted use, the primary object is to detect sensitive but high-specificity pathogenic or clinically actionable human viruses (e.g., herpesviruses, papillomaviruses), with interpretable evidence at the genome level. In this study, ViromeScan profiled many eukaryotic and giant-virus calls and human endogenous retrovirus hits, while FastViromeExplorer and Kraken2 mainly emphasized phages. For clinical questions, a human and eukaryotic-virus-prioritized database and stringent filters should be preferable. For overall virome discovery such as ecology or evolution study, the primary object is a comprehensive cataloging of viral diversity, especially phages and novel lineages, inference of community structure, dynamics, and putative functions. FastViromeExplorer and Kraken2 produced higher richness and stronger between-site structure, which is useful for ecological contrasts. On the other hand, ViromeScan under-called phages and compressed beta-diversity, limiting discovery power. Platform-dependent phylum or genus shifts show that database breadth steers ecological narratives.

There are several limitations and caveats. Read-based profilers cannot recover novel viruses absent from the reference. Databases evolve rapidly and our conclusions reflect specific versions and may shift as catalogs expand. Host depletion and library prep differ across sites and residual host reads can suppress viral detection and bias tool comparisons (26).

CONCLUSION

Human virus detection and overall virome discovery are related but different problems. The former prioritizes specificity, interpretability, and confirmability, wihle the latter prioritizes breadth, sensitivity, and novelty capture. Our benchmark shows that platform choice materially alters both the amount of viral signal and the story one tells about virome composition. Choosing a pipeline aligned to the biological question is therefore essential.

AUTHOR CONTRIBUTIONS

Na Rae Choi: Writing - original draft, Methodology; Jung Hwa Park: Visualization; Tae Sung Kim : Supervision; Hee Sam Na: Conceptualization, Software, Writing - review & editing.

FUNDING

This work was supported by a 2-Year Research Grant of Pusan National University.

ETHICS STATEMENT

Not applicable.

CONFLICT OF INTEREST

The authors have no financial conflicts of interest.

References

Lathakumari RH, Vajravelu LK, Gopinathan A, Vimala PB, Panneerselvam V, Ravi SSS, et al. The gut virome and human health: From diversity to personalized medicine. Eng Microbiol. 2025;5(1):100191.

10.1016/j.engmic.2025.10019140538711PMC12173812

Khokhar RK, Nashwan AJ. Gut virome and its emerging role in inflammatory bowel disease. World J Methodol. 2025;15(3):100534.

10.5662/wjm.v15.i3.10053440881220PMC11948198

Virgin HW. The virome in mammalian physiology and disease. Cell. 2014;157(1):142-150.

10.1016/j.cell.2014.02.03224679532PMC3977141

Norman JM, Handley SA, Baldridge MT, Droit L, Liu CY, Keller BC, et al. Disease-specific alterations in the enteric virome in inflammatory bowel disease. Cell. 2015;160(3):447-460.

10.1016/j.cell.2015.01.00225619688PMC4312520

Sulek K. [Nobel prizes for John F. Enders, Frederick Ch, Robbins and Thomas H. Weller in 1954 for discovery of the possibility of growing poliomyelitis virus on various tissue media]. Wiad Lek. 1968;21(24):2301-2303.

Bibby K. Metagenomic identification of viral pathogens. Trends Biotechnol. 2013;31(5):275-279.

10.1016/j.tibtech.2013.01.016

Hall RJ, Draper JL, Nielsen FG, Dutilh BE. Beyond research: a primer for considerations on using viral metagenomics in the field and clinic. Front Microbiol. 2015;6:224.

10.3389/fmicb.2015.0022425859244PMC4373370

Roux S, Enault F, Hurwitz BL, Sullivan MB. VirSorter: mining viral signal from microbial genomic data. PeerJ. 2015;3:e985.

10.7717/peerj.98526038737PMC4451026

Ren J, Ahlgren NA, Lu YY, Fuhrman JA, Sun F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome. 2017;5(1):69.

10.1186/s40168-017-0283-528683828PMC5501583

Roux S, Tournayre J, Mahul A, Debroas D, Enault F. Metavir 2: new tools for viral metagenome comparison and assembled virome analysis. BMC bioinformatics. 2014;15:76.

10.1186/1471-2105-15-7624646187PMC4002922

Yang C, Chowdhury D, Zhang Z, Cheung WK, Lu A, Bian Z, et al. A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput Struct Biotechnol J. 2021;19:6301-6314.

10.1016/j.csbj.2021.11.02834900140PMC8640167

Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20(1):257.

10.1186/s13059-019-1891-031779668PMC6883579

Rampelli S, Soverini M, Turroni S, Quercia S, Biagi E, Brigidi P, et al. ViromeScan: a new tool for metagenomic viral community profiling. BMC genomics. 2016;17:165.

10.1186/s12864-016-2446-326932765PMC4774116

Tithi SS, Aylward FO, Jensen RV, Zhang L. FastViromeExplorer-Novel: Recovering Draft Genomes of Novel Viruses and Phages in Metagenomic Data. J Comput Biol. 2023;30(4):391-408.

10.1089/cmb.2022.0397

Peterson J, Garges S, Giovanni M, McInnes P, Wang L, Schloss JA, et al. The NIH Human Microbiome Project. Genome Res. 2009;19(12) :2317-2323.

10.1101/gr.096651.10919819907PMC2792171

Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, et al. Metagenomic biomarker discovery and explanation. Genome Biol. 2011;12(6):R60.

10.1186/gb-2011-12-6-r6021702898PMC3218848

Ghurye JS, Cepeda-Espinoza V, Pop M. Metagenomic Assembly: Overview, Challenges and Applications. Yale J Biol Med. 2016;89(3):353-362.

Li D, Liu CM, Luo R, Sadakane K, Lam TW. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674-1676.

10.1093/bioinformatics/btv033

Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, et al. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software. Nat Methods. 2017;14(11):1063-1071.

10.1038/nmeth.445828967888PMC5903868

Liu Y, Demina TA, Roux S, Aiewsakun P, Kazlauskas D, Simmonds P, et al. Diversity, taxonomy, and evolution of archaeal viruses of the class Caudoviricetes. PLoS Biol. 2021;19(11):e3001442.

10.1371/journal.pbio.300144234752450PMC8651126

Aylward FO, Moniruzzaman M, Ha AD, Koonin EV. A phylogenomic framework for charting the diversity and evolution of giant viruses. PLoS Biol. 2021;19(10):e3001430.

10.1371/journal.pbio.300143034705818PMC8575486

Colson P, De Lamballerie X, Yutin N, Asgari S, Bigot Y, Bideshi DK, et al. “Megavirales”, a proposed new order for eukaryotic nucleocytoplasmic large DNA viruses. Arch Virol. 2013;158(12):2517-2521.

10.1007/s00705-013-1768-623812617PMC4066373

Dotto-Maurel A, Arzul I, Morga B, Chevignon G. Herpesviruses: overview of systematics, genomic complexity and life cycle. Virol J. 2025;22(1):155.

10.1186/s12985-025-02779-740399963PMC12096621

Cossart YE. The rise and fall of infectious diseases: Australian perspectives, 1914-2014. Med J Aust. 2014;201(1 Suppl):S11-4.

10.5694/mja14.0011225047768PMC7168456

Van Doorslaer K, Chen Z, Bernard HU, Chan PKS, DeSalle R, Dillner J, et al. ICTV Virus Taxonomy Profile: Papillomaviridae. J Gen Virol. 2018;99(8):989-990.

10.1099/jgv.0.00110529927370PMC6171710

Kim M, Parrish RC 2nd, Tisza MJ, Shah VS, Tran T, Ross M, et al. Host DNA depletion on frozen human respiratory samples enables successful metagenomic sequencing for microbiome studies. Commun Biol. 2024;7(1):1590.

10.1038/s42003-024-07290-339609616PMC11604929

JOURNAL OF BACTERIOLOGY AND VIROLOGY ISSN:1598-2467(Print) 2093-0429(Online)

Preview

Benchmarking Read-Based Virome Profilers for Human Virus Detection and Community Discovery

ABSTRACT

MAIN

Table 1.

Summary of data preprocessing

Table 2.

Virus counts depending on platforms

Fig. 1

Alpha diversity of sampling site depending on platforms. (A) Kraken2, (B) FastViromeExplorer, and (C) ViromeScan. Alpha diversity was used to describe the microbial richness and evenness within samples using the Chao1 and Shannon index.

Fig. 2

Beta diversity of sampling sites depending on platforms. (A) Kraken2, (B) FastViromeExplorer, and (C) ViromeScan. Principal coordinate analysis (PCoA) of the Bray-Curtis distance was performed to determine the viral community structure.

Fig. 3

Relative abundance of Phyla depending on platforms. (A) Kraken2, (B) FastViromeExplorer, and (C) ViromeScan. Relative abundance of top 10 most abundant phyla were plotted.

Fig. 4

Relative abundance of Genus depending on platforms. (A) Kraken2, (B) FastViromeExplorer, and (C) ViromeScan. Relative abundance of top 15 most abundant genus were plotted.

Fig. 5

Relative abundance of Species depending on platforms. (A) Kraken2, (B) FastViromeExplorer, and (C) ViromeScan. Relative abundance of top 15 most abundant species were plotted.

Fig. 6

AUTHOR CONTRIBUTIONS

FUNDING

ETHICS STATEMENT

CONFLICT OF INTEREST

References