It’s a fascinating and possibly pointless exercise, trying to work out how search engines work. Although this article was inspired by a news story on beating (so-called) plagiarism detectors, I found myself more interested in what the story told us about Google and (presumably) other search engines.
The story starts withn an article in Hoax-Alert: Forget Russian Bots: Fake Native Americans Are Using Russian Characters To Avoid Fake News and Plagiarism DetectorsThe story relates how a number of websites which appear to be promoted by Native Americans are in fact sites originating in Kosovo and other countries. It seems that they are stealing content, disguising it (to escape similarity detectors) and getting away with it. The way they disguise the content is to substitute Cyrillic characters which look like Latin alphabet characters in text, in order to beat text-matching software. The HoaxAlert story shows this illustration: Continue reading