To protect performance and disk space usage, all Mac OS applications that use find by content have a 2000-term index limit. This includes all versions of Sherlock, Mac Help, Mail, and the Mac OS X 10.2 Find dialog.
"Find by content" indexing allows you to locate a file by the words it contains. You could, for example, locate your meeting notes by typing a phrase contained in them. All Mac OS applications that feature find by content indexing use the same underlying engine. These applications include:
Note: In Mac OS X 10.2 and later, the Find dialog of the Finder assumes the find by content function that was previously part of Sherlock. To learn more about it, see technical document 107005, "Mac OS X 10.2: How to Find Items on the Hard Disk".
Only the first 2000 unique terms of each file are indexed. During this process, a file's text is processed into "terms." This means they are forced to all lower-case, and they may be "stemmed" by removing certain grammatical endings. Thus "BROTHERS", "brother", "Brother's", and "Brother" would be considered the same term, "brother."
"Unique" means that for purposes of the 2000-term limit, each term is only counted the first time it is found. So the indexing process does not stop after the first 2000 terms of a file, but rather when its vocabulary exceeds 2000 terms.
Very few natural language documents have a vocabulary in excess of 2000 words, so this limitation should not hinder you in normal usage. If you were to index a dictionary of the English language, for example, most of it would not be indexed. The same is true of documents containing long lists of names.
The limit is necessary in order to limit the size of index files, and the time required to create them. An index all of the hypothetical English dictionary would be huge. This limit also helps prevent an excess of indexed terms from files that are not natural language documents, and thus that you would not want to index. Such files may be computer-generated, containing many thousands of strings that look like words, but are not.