Sequence Assembly with CAFTOOLS
- Simon Dear1,
- Richard Durbin1,
- LaDeana Hillier2,
- Gabor Marth2,
- Jean Thierry-Mieg3 and
- Richard Mott1,4,5
- 1Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK; 2Genome Sequencing Center, Washington University, St. Louis, Missouri 63108 USA; 3CRBM du Centre National de la Recherche Scientifique (CNRS), Route de Mende, Montpellier, France; 4SmithKline Beecham Pharmaceuticals, New Frontiers Science Park (North), Harlow, Essex, CM19 5AW, UK
Abstract
Large-scale genomic sequencing requires a software infrastructure to support and integrate applications that are not directly compatible. We describe a suite of software tools built around the Common Assembly Format (CAF), a comprehensive representation of a sequence assembly as a text file. These tools form the backbone of sequencing informatics at the Sanger Centre and the Genome Sequencing Center. The CAF format is intentionally flexible, and our Perl and C libraries, which parse and manipulate it, provide powerful tools for creating new applications as well as wrappers to incorporate other software. The tools are available free by anonymous FTP from ftp://ftp.sanger.ac.uk/pub/badger/.
- Received December 1, 1997.
- Accepted January 29, 1998.
- Cold Spring Harbor Laboratory Press