We have approximately 80,000 documents that we're indexing. We know that there are some duplicates in the collection. By duplicate, they're not usually the situation where two documents are completely identical in their contents. One document might have had some very minor editing changes or even just punctuation differences from another document. Are there any features in SearchUnit that would facilitate identifying two documents that we would potentially call duplicates?