Physical mapping data were combined with public draft and finished sequences to derive subtelomeric sequence assemblies for each of the 41 genetically distinct human telomere regions. Sequence gaps that remain on the reference telomeres are generally small, well-defined, and for the most part, restricted to regions directly adjacent to the terminal (TTAGGG)n tract. Of the 20.66 Mb of subtelomeric DNA analyzed, 3.01 Mb are subtelomeric repeat sequences (Srpt), and an additional 2.11 Mb are segmental duplications. The subtelomeric sequence assemblies are enriched >25-fold in short, internal (TTAGGG)n-like sequences relative to the rest of the genome; a total of 114 (TTAGGG)n-like islands were found, 55 within Srpt regions, 35 within one-copy regions, 11 at one-copy/Srpt or Srpt/segmental duplication boundaries, and 13 at the telomeric ends of assemblies. Transcripts were annotated in each assembly, noting their mapping coordinates relative to their respective telomere and whether they originate in duplicated DNA or single-copy DNA. A total of 697 transcripts were found in 15.53 Mb of one-copy DNA, 76 transcripts in 2.11 Mb of segmentally duplicated DNA, and 168 transcripts in 3.01 Mb of Srpt Sequence. This overall transcript density is similar (within ∼10%) to that found genome-wide. Zinc finger-containing genes and olfactory receptor genes are duplicated within and between multiple telomere regions.
All Science Journal Classification (ASJC) codes