Sequence Assembly with CAFTOOLS

Abstract

Large-scale genomic sequencing requires a software infrastructure to support and integrate applications that are not directly compatible. We describe a suite of software tools built around the Common Assembly Format (CAF), a comprehensive representation of a sequence assembly as a text file. These tools form the backbone of sequencing informatics at the Sanger Centre and the Genome Sequencing Center. The CAF format is intentionally flexible, and our Perl and C libraries, which parse and manipulate it, provide powerful tools for creating new applications as well as wrappers to incorporate other software. The tools are available free by anonymous FTP from ftp://ftp.sanger.ac.uk/pub/badger/.

  • Received December 1, 1997.
  • Accepted January 29, 1998.
| Table of Contents

Preprint Server