Large-scale detection of repetitions

W. F. Smyth

doi:10.1098/rsta.2013.0138

Back

Journal article

Peer reviewed

Large-scale detection of repetitions

W. F. Smyth

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol.372(2016)

2014

DOI: https://doi.org/10.1098/rsta.2013.0138

Files and links (1)

url

Free to Read *No subscription requiredView

Abstract

Combinatorics on words began more than a century ago with a demonstration that an infinitely long string with no repetitions could be constructed on an alphabet of only three letters. Computing all the repetitions (such as ⋯TTT⋯ or ⋯CGACGA⋯ ) in a given string x of length n is one of the oldest and most important problems of computational stringology, requiring Embedded Image time in the worst case. About a dozen years ago, it was discovered that repetitions can be computed as a by-product of the Θ(n)-time computation of all the maximal periodicities or runs in x. However, even though the computation is linear, it is also brute force: global data structures, such as the suffix array, the longest common prefix array and the Lempel–Ziv factorization, need to be computed in a preprocessing phase. Furthermore, all of this effort is required despite the fact that the expected number of runs in a string is generally a small fraction of the string length. In this paper, I explore the possibility that repetitions (perhaps also other regularities in strings) can be computed in a manner commensurate with the size of the output.

Details

Title: Large-scale detection of repetitions
Authors/Creators: W. F. Smyth (Author/Creator) - McMaster University
Publication Details: Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol.372(2016)
Publisher: Royal Society Publishing
Identifiers: 991005540382307891
Murdoch Affiliation: Murdoch University
Language: English
Resource Type: Journal article

Metrics

24 Record Views

1 Times Cited - Web of Science

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Citation topics: 4 Electrical Engineering, Electronics & Computer Science; 4.182 Data Structures, Algorithms & Complexity; 4.182.1103 Efficient Algorithms
Web Of Science research areas: Computer Science, Theory & Methods
ESI research areas: Physics