Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA.

Results

We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis).

Conclusion

This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

Related collections

Most cited references 34

Record: found
Abstract: found
Article: not found

Human-mouse alignments with BLASTZ.

Scott Schwartz, W. Kent, Arian Smit … (2003)

The Mouse Genome Analysis Consortium aligned the human and mouse genome sequences for a variety of purposes, using alignment programs that suited the various needs. For investigating issues regarding genome evolution, a particularly sensitive method was needed to permit alignment of a large proportion of the neutrally evolving regions. We selected a program called BLASTZ, an independent implementation of the Gapped BLAST algorithm specifically designed for aligning two long genomic sequences. BLASTZ was subsequently modified, both to attain efficiency adequate for aligning entire mammalian genomes and to increase its sensitivity. This work describes BLASTZ, its modifications, the hardware environment on which we run it, and several empirical studies to validate its results.

0 comments Cited 474 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.

E Kirkness, C Bult, A Kerlavage … (1995)

An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism.

0 comments Cited 292 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

A sequencing method based on real-time pyrophosphate.

M Ronaghi, M Uhlén, P Nyrén (1998)

0 comments Cited 202 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Genomics

Title: BMC Genomics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2164

Publication date Collection: 2007

Publication date (Electronic): 24 May 2007

Volume: 8

Page: 132

Affiliations

[1 ]Department of Crop Sciences, University Of Illinois, Urbana, IL 61801, USA

Article

Publisher ID: 1471-2164-8-132

DOI: 10.1186/1471-2164-8-132

PMC ID: 1894642

PubMed ID: 17524145

SO-VID: 9d2eaae5-4299-4aa6-b728-c98db553749c

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 22 December 2006

Date accepted : 24 May 2007

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 39

See all cited by

Most referenced authors 1,478

See all reference authors

Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genome Integrity

Most cited references 34

Human-mouse alignments with BLASTZ.

Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.

A sequencing method based on real-time pyrophosphate.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 85

Cited by 39

Most referenced authors 1,478