Talk:FM-index

Learn more about this page

This article is rated Stub-class on Wikipedia's content assessment scale.
It is of interest to the following WikiProjects:

Computer science

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

??? This article has not yet received a rating on the project's importance scale.

This article has been automatically rated by a bot or other tool because one or more other projects use this class. Please ensure the assessment is correct before removing the |auto= parameter.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Articles for creation

	This article was reviewed by member(s) of WikiProject Articles for creation. The project works to allow users to contribute quality articles and media files to the encyclopedia and track their progress as they are developed. To participate, please visit the project page for more information.Articles for creationWikipedia:WikiProject Articles for creationTemplate:WikiProject Articles for creationAfC
	This article was accepted on 31 March 2009 by reviewer Raven1977 (talk · contribs).

What is an FM Index?

Latest comment: 2 years ago2 comments2 people in discussion

I'm not convinced that this article addresses the "what" question. The last sentence of the section "FM-index data structure" has:

The FM-index itself is a compression of the string L together with C and Occ in some form, as well as information that maps a selection of indices in L to positions in the original string T.

This really needs expansion. It seems strang to talk about the LF mapping in so much detail and then not describe how the index is structured. gringer (talk) 06:14, 16 June 2013 (UTC)Reply

Agreed. Of particular note - how does the compression work? Per the article's description, L is the same size as the original text, so we need to be compressing L in order for the FM-index as a whole to be smaller than the original text, but there's no indication of how that compression is done - or of how the count and locate operations access data within the compressed data structure. ExplodingCabbage (talk) 12:54, 16 March 2022 (UTC)Reply

Poor description of "FM-index data structure"

Latest comment: 2 years ago3 comments2 people in discussion

Resolved

– Possibly there was an error here in 2014 when this was posted, but now the article is clear that T ends with a $ and so L[i] is NOT the last character of T. ExplodingCabbage (talk) 13:11, 16 March 2022 (UTC)Reply

The article is confusing. For example it says: "For any row i of the matrix, the character in the last column L[i] precedes the character in the first column F[i] also in T." What does it mean? It is clearly not true: If we let i be the first row of the matrix, then L[i] is the last letter in the string T, so it must come after everything else in T. Also if i represents the first row, F[i] must be the letter in T which comes first in alphabetic ordering ($ in the example). So no other letter can come before it.

Perhaps after "the matrix" we should insert "M". Perhaps before "T" we should insert "the original string". 128.16.7.220 (talk) 15:47, 7 June 2014 (UTC) BillReply

The particular sentence you quote is correct. You say that

"If we let i be the first row of the matrix, then L[i] is the last letter in the string T"

but this is not true. Note that T = "abracadabra$", that L[i] is 'a' ans F[i] is '$'. ExplodingCabbage (talk) 13:06, 16 March 2022 (UTC)Reply

Count list item 1.

Latest comment: 2 years ago2 comments2 people in discussion

Resolved

– The article does not currently suggest that "bra" starts with 'a'. ExplodingCabbage (talk) 13:13, 16 March 2022 (UTC)Reply

Surely suffix "bra" ends with a. It starts with b. 128.16.7.220 (talk) 16:05, 7 June 2014 (UTC) BillReply

Count example off by one error?

Latest comment: 2 years ago2 comments2 people in discussion

Could someone else check the example in the "Count" section. In most places in the article the indexes start at one. Assuming that, the start and end values given in the example appear to be out by one in most cases. 128.16.7.220 (talk) 16:58, 7 June 2014 (UTC) BillReply

The start and end values all look correct to me. Can you give an example of one you think is incorrect and walk through why? ExplodingCabbage (talk) 14:23, 16 March 2022 (UTC)Reply

Locate section needs examples. Existing example needs checking

Latest comment: 2 years ago2 comments2 people in discussion

Existing applications of FM-index use it for looking up strings, rather than counting how many times the string occurs. Hence it is more important that the "Locate" section makes sense than the "Count" section. Therefore would it be possible to add more explaination and/or examples to the "Locate" section.

The existing text says "For instance locate(7) = 8" appears to be wrong. "locate(7)" appears to mean the 7th character in L, which is "c". "c" occurs only once in "abracadabra" and that is position 5 not position 7.

128.16.7.220 (talk) 17:29, 7 June 2014 (UTC) BillReply

"locate(7)" appears to mean the 7th character in L, which is "c"

No, it's "a", not "c". The example looks correct to me. ExplodingCabbage (talk) 14:22, 16 March 2022 (UTC)Reply

What is ε?

Latest comment: 2 years ago1 comment1 person in discussion

The "Locate" section uses the formulae $O(p + occ log ε u)$ and $O\left(H_{k}(T)+{{\log \log u} \over {\log ^{\epsilon }u}}\right)$ without ever defining ε.

ExplodingCabbage (talk) 14:28, 16 March 2022 (UTC)Reply

What is $H_{k}$ ?

Latest comment: 2 years ago1 comment1 person in discussion

The "locate" section uses the formula $O\left(H_{k}(T)+{{\log \log u} \over {\log ^{\epsilon }u}}\right)$ without ever defining $H$ , which as far as I can spot isn't defined anywhere else in the article either.

ExplodingCabbage (talk) 14:28, 16 March 2022 (UTC)Reply

How does Occ work? How can it POSSIBLY be computed in constant time?

Latest comment: 2 years ago1 comment1 person in discussion

The function Occ(c, k) is a necessary part of both the "count" and "locate" algorithms yet is skipped over with no explanation. Indeed, the article makes it sound rather magical - apparently "it is possible to compute Occ(c, k) in constant time" despite the fact that the obvious dumb approach to implementing Occ (iterate over the first k characters and count the ones lexically smaller than c) of course takes O(k) time. Nowhere are we told what algorithm is used to achieve this magic, nor what data structures within the FM-index it uses.

ExplodingCabbage (talk) 14:29, 16 March 2022 (UTC)Reply

Add topic

Talk:FM-index

What is an FM Index?

Poor description of "FM-index data structure"

Count list item 1.

Count example off by one error?

Locate section needs examples. Existing example needs checking

What is ε?

What is H k {\displaystyle H_{k}} ?

How does Occ work? How can it POSSIBLY be computed in constant time?

What is $H_{k}$ ?