INTRODUCTION
Viruses are notably the most abundant and diverse entities in aquatic environments, with an estimated 107 viral particles per milliliter of ocean water; these viruses are predominantly phages that infect bacteria and archaea [1,2]. In recent decades, these marine viruses have been demonstrated to be critical players in marine biogeochemical cycles, by affecting marine biomass, and substantially influencing both marine communities and the environment [3]. Marine viruses’ ability to infect a wide range of marine organisms as diverse as bacteria and whales poses substantial risks of disease and mortality [4]. Notable viral families including Nodaviridae, Birdaviridae, and Rhabdoviridae, are recognized for their ability to infect marine organisms. Certain viruses within these families have triggered epidemic outbreaks, thus resulting in extensive financial losses to the aquaculture industry [5,6].
The landscape of virology research has been transformed by advances in high-throughput sequencing and bioinformatics, which have enabled the discovery of previously unknown viruses that lack universally conserved genes and are resistant to standard culturing techniques [2,7]. Despite these advances, a noticeable gap exists in systematic comparative studies exploring the viral spectra of marine organisms in the South China Sea, which require further investigation and elucidation. This study modestly addresses this gap by performing sampling, high-throughput sequencing, and evolutionary analyses of marine organisms in the South China Sea. Viromes encompassing both background and microorganism-associated viruses were obtained from three phyla (Chordata, Arthropoda, and Mollusca). Phylogenetic analysis facilitated the classification of the identified viruses, with particular emphasis on those affecting the health and aquaculture of Mollusca. Additionally, a cross-sectional comparison of viral family diversity among the three phyla revealed the intricate nature of the potential viral community in the South China Sea. These findings not only provide essential insights into the viral spectrum of marine organisms in the region but also have important implications regarding potential epidemics within the local aquaculture industry.
MATERIALS AND METHODS
A total of 38 aquatic samples, including Chordata, Arthropoda, and Mollusca, were collected from the coast of the South China Sea in Hainan Province, China, in November and December of 2020 (S1 Fig; sampling sites marked with red triangle). Samples were collected with sterile gloves, stored at −80°C in separate sterile bags by species, and transported to our laboratory for further analysis. The gills and viscera of the organisms were dissected and homogenized with phosphate-buffered saline solution for RNA extraction in biosafety cabinet. Total RNA was extracted with a Takara MiniBEST Viral RNA extraction kit (cat. 9766) and sequenced on the Illumina HiSeq 3000 platform. The resulting high-throughput sequencing data have been deposited in the China National GeneBank database under open access (CNGBdb, project ID: CNP0005008). Low-quality reads, including those with lengths <50 bp or low complexity, were excluded. Adaptors were removed with Trimmomatic V0.32 [8], and assembly was performed with megahit V1.2.9 [9]. To identify viral sequences from the assembled scaffolds, we mapped contigs to the non-redundant nucleotide (NT) and non-redundant protein (NR) databases with BLASTN and diamond BLASTX, with an E-value cutoff of 1e-5. The open reading frames of viral proteins were predicted with NCBI’s ORF Finder. Sequences identified as viruses in the NCBI database were further filtered through manual validation. Visualization was performed in R 3.4.0. Subsequently, potential contaminant sequences were manually filtered out.
The constructed evolutionarily tree was subjected to p-value testing with a threshold of p < 0.8 in MEGA 11, based on homologous sequences [10]. Saturation analysis was performed with DAMBE v7 to ensure that the aligned sequences were free of saturation—a prerequisite for optimal tree construction [11]. Multiple sequence alignment was performed accurately with MAFFT v7.471 [12], and the evolutionary model for tree construction was carefully determined with ModelFinder [13]. The identified model was then applied in IQ-tree v2.0 for the actual construction of the evolutionary tree [14]. To evaluate the robustness of the tree topology, an exhaustive UFBOOT analysis was iteratively conducted for 5000 replicates [15]. All analytical procedures were rigorously performed in PhyloSuite v1.2.2 [16].
RESULTS
Among the 38 collected aquatic samples, 14 belonged to Chordata, six belonged to Arthropoda, and 18 belonged to Mollusca (S1 Table).
Figure 1A depicts a viral spectrum based on the transcripts per million (TPM) values, illustrating the abundance of viral families. A comprehensive annotation process identified 31 viral families, including unclassified Caudovirales and Picornavirales. Among these viral families, 13 were classified as double-stranded DNA (dsDNA) viruses, three were classified as single-stranded DNA (ssDNA) viruses, three were classified as double-stranded RNA (dsRNA) viruses, and nine were classified as single-stranded RNA (ssRNA) viruses. Notably, gills from Chordata revealed annotations of five families (Myoviridae, Iridoviridae, Adintoviridae, Schitoviridae, and Siphoviridae); those from Arthropoda showed 20 families (Aggregaviridae, Assiviridae, Autographiviridae, Casjensviridae, Caulimoviridae, Chaseviridae, Myoviridae, Podoviridae, Steigviridae, Zobellviridae, Circoviridae, Microviridae, Fiersviridae, Herpesviridae, Nimaviridae, Schitoviridae, Siphoviridae, Parvoviridae, Reoviridae, and Picornaviridae); and Mollusca featured one specific viral family (Herpesviridae). The viscera samples contained diverse families, including Orthomyxoviridae and Retroviridae in Chordata; Nimaviridae, Parvoviridae, Reoviridae, and Sedoreoviridae in Arthropoda; and a diverse group of 11 families in Mollusca (Astroviridae, Chuviridae, Marnaviridae, Tombusviridae, Picobirnaviridae, Adintoviridae, Herpesviridae, Schitoviridae, Siphoviridae, Dicistroviridae, and Picornaviridae).

Viral Composition and Diversity Patterns in Chordata, Arthropoda, and Mollusca.
A. Distribution of viral families across the three phyla, quantified as transcripts per million (TPM). Bar colors represent different viral families; blue indicates gills, and green indicates viscera. B. Viral diversity, measured by the Shannon diversity index (SHDI). The left panel shows the SHDI for all samples, whereas the right panel differentiates between gill (g) and viscera (v) samples. An asterisk (*) indicates statistical significance, while ** and **** represent p < 0.01 and p < 0.0001, respectively. C. Viral identity distribution, where each point represents a sequence read associated with a specific viral family. The identity value on the Y-axis represents the similarity between the contigs of the detected viral families and those in the database.
In Fig 1B, the left panel illustrates the Shannon diversity index (SHDI) across samples from the three phyla, thereby revealing substantial differences in viral diversity among Mollusca, Chordata, and Arthropoda, and a distinct difference between Mollusca and Chordata. The right panel compares viral compositional diversity in the gills of Chordata, the viscera of Arthropoda, and the viscera of Mollusca, thus confirming the consistency of the observed differences. Relatively novel viruses, showing lower sequence identity to known viruses in existing databases, were detected in most viral families (Fig 1C).
Detection of Herpesviridae in eight samples, predominantly within Mollusca, included a terminase sequence (Mv_2126) closely associated with Soft-shell clam Herpesviridae 1 in the viscera of Cyclina sinensis, and sharing an identity of 63.14%. Fig 2A depicts the clustering of this terminase sequence with unclassified Herpesviridae from various hosts. Another viscera sample of Mollusca (Mv_0245) contained a sequence with a remarkable 29.08% similarity to the non-structural protein sequences of Dicistroviridae, and a sequence with a substantial 27.64% similarity to the structural protein sequences of Dicistroviridae. Both sequences clustered within Aparavirus (Figs 2B and 2C). These results underscore the potential links between the identified sequences in Mollusca and the Dicistroviridae, and warrant further investigation of their functional implications.
DISCUSSION
With the escalating growth of marine and freshwater aquaculture, the occurrence of zoonotic diseases in aquatic animals has emerged as a crucial threat to the sustainable development of the aquaculture industry and human health. Viral infections notably contribute to annual mass mortality events in aquaculture, and have substantial economic and societal repercussions [17]. These viral outbreaks can disrupt marine food webs and decrease biodiversity, and consequently might affect the broader ecosystem. Thus, a comprehensive investigation of the viral composition and diversity of offshore marine organisms is essential to understand their diversity and epidemic potential.
Herein, we accurately examined the viral composition and diversity of the three phyla and their corresponding families in the sampled organisms. Notably, certain prominent viral families, including unclassified Caudovirales, Adintoviridae, and Herpesviridae, exhibited both high abundance and a wild distribution range (Fig 1A), in agreement with previous research on viruses affecting marine organisms [18,19]. In contrast to the viral families frequently annotated in other aquatic environments [1,2], our study revealed a higher diversity of viruses; therefore, our extensive annotation of diverse viruses in marine ecosystems might underscore risks to aquatic biota that exceed those posed by other environmental factors, potentially affecting both biodiversity and ecosystem stability. Notably, Adintoviridae were identified in the gills of Chordata but not Arthropoda, a finding contradicting previous reports suggesting their presence in Arthropoda [19]. Some outliers with notably high or low SHDI values were observed (Fig 1B); these findings might be attributable to specific ecological factors or sample conditions affecting viral diversity and species richness in samples. This discrepancy might possibly have originated from variations in the sample collection methods and environmental conditions, thus underscoring the need for further research.
Comparisons with existing databases revealed relatively novel viruses within the Adintoviridae in Chordata and Mollusca, characterized by lower sequence identity to known viruses, thus indicating the presence of potentially unique viral species. The identification of the terminase gene associated with Herpesviridae in the viscera of Mollusca provided valuable insights into the genetic relationships and relatedness of the viral sequences found in the Mollusca sample. In the phylogenetic tree, the successful assembly of the structural and non-structural proteins of Dicistroviridae (Fig 2B)—a family that includes Taura syndrome virus, a species known to infect shrimp and cause mass epidemics in shrimp farms [20]—suggests a potential risk for epidemic outbreaks and impacts on the aquaculture economy. Our findings demonstrate that the results of evolutionary analysis can contribute to an understanding of the genetic diversity within these viral families, and further emphasize that studying mollusks in the context of marine virus research is essential.
The diversity indices (SHDI; Fig 1B) revealed significant differences in viral diversity between Mollusca and both Chordata and Arthropoda. This finding suggests differences in viral diversity, potentially influenced by host physiology, habitat, and interactions, across aquatic species. The overall viral abundance and diversity associated with Mollusca were significantly higher than those associated with the other two phyla, probably because mollusks’ filter-feeding behavior exposes them to a higher influx of viruses from the marine environment. These findings highlight the importance of Mollusca and its viscera in studying the enrichment of marine viruses; contribute to a systematic understanding of the baseline viral spectrum within the bivalve ecological niche; and expand knowledge of viruses across marine habitats. In addition, our findings suggest a potentially higher risk associated with the consumption of Mollusca than Chordata or Arthropoda.
Advances in sequencing technologies have expanded the discovery of marine viruses. For example, high-throughput sequencing has enabled the analysis of viruses in marine organisms [21]. However, considering the organisms that make up the viral microenvironment, in addition to organism-specific viruses, is crucial [18,22]. To address this knowledge gap, our study provides a detailed viral composition of three phyla in the South China Sea, including both the background viruses infecting the organisms and the viruses in the organisms’ microenvironment. This approach facilitates the investigation of potential viral causes of future epidemic outbreaks, improves understanding of the realistic viral spectrum carried by marine organisms, and expands the map of marine viruses in the South China Sea.
However, this study has both sampling and technical limitations. The inherent high diversity and large populations of marine organisms require a more comprehensive dataset to strengthen the robustness of our conclusions. Challenges associated with transporting biological samples over long distances and the potential degradation of RNA during sampling and subsequent processing might have resulted in incomplete data. Although we made diligent efforts to collect marine organisms representing three distinct phyla, and to perform viral composition analyses with consistent experimental techniques and methods within a relatively short timeframe, the small sample size remains a potential limitation. Consequently, we must acknowledge this limitation and the need for future studies with larger datasets, to ensure more robust comparisons and a comprehensive understanding of viral dynamics in the South China Sea. These efforts would help advance knowledge of marine viral ecology and inform strategies for the sustainable management of aquatic ecosystems.
CONCLUSION
In summary, this study explored viral composition and diversity among offshore marine organisms in the South China Sea, and revealed a higher diversity compared to previous studies. Key identified viral communities included unclassified Caudovirales, Adintoviridae, and Herpesviridae. We observed remarkable variation in viral diversity among phyla, including novel viruses discovered within the Herpesviridae and Dicistroviridae from Mollusca, thus indicating the presence of unique viral species in the marine ecosystem. These findings have substantial implications for the aquaculture industry and highlight the potential zoonotic risks associated with marine organisms. The identification and characterization of previously unknown viruses within key marine organisms in our study underscores potential risks to human health and other species, emphasizes the importance of thorough surveillance efforts, and indicates the critical role of understanding marine viral communities in ecosystem management and proactive control of epidemic outbreaks.