Logarithmically Larger Deletion Codes of All Distances

Noga Alon, Gabriela Bourla, Ben Graham, Xiaoyu He, Noah Kravitz

Research output: Contribution to journalArticlepeer-review

Abstract

— The deletion distance between two binary words u, v ∈ {0, 1}n is the smallest k such that u and v share a common subsequence of length n−k. A set C of binary words of length n is called a k-deletion code if every pair of distinct words in C has deletion distance greater than k. In 1965, Levenshtein initiated the study of deletion codes by showing that, for k ≥ 1 fixed and n going to infinity, a k-deletion code C ⊆ {0, 1}n of maximum size satisfies Ωk(2n/n2k) ≤ |C| ≤ Ok(2n/nk). We make the first asymptotic improvement to these bounds by showing that there exist k-deletion codes with size at least Ωk(2n log n/n2k). Our proof is inspired by Jiang and Vardy’s improvement to the classical Gilbert–Varshamov bounds. We also establish several related results on the number of longest common subsequences and shortest common supersequences of a pair of words with given length and deletion distance.

Original languageEnglish (US)
Pages (from-to)125-130
Number of pages6
JournalIEEE Transactions on Information Theory
Volume70
Issue number1
DOIs
StatePublished - Jan 1 2024

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Library and Information Sciences
  • Computer Science Applications

Keywords

  • Deletion codes
  • longest common subsequence
  • probabilistic combinatorics

Cite this