Yes, you're right about the 'noise' - a log file's main source of noise is dates and times, because every instance like 4/5/2015 or 03:20:02 will be put in the lexicon as a separate word.
From the Help:
Quote:
For example, if you have a very large repository of observational data (e.g. text files filled with records from flight, weather data etc that have many different 'word' strings) then indexing it may cause slow downs for certain searches (wildcards) and any index change operations (adding/removing etc). Although Search can handle this type of data, any unnecessary indexing should be avoided if possible.
Hint - the Configuration.IndexNumbers property can also be set false to not index/search for numbers.
If you want numbers to be indexed in some places and not others, then I can show you how to write a plug-in to remove text you don't need indexed (and any other noise) - this would help minimize the index size.
To answer your main point, as you can see it's hard to give specifics about maximum sizes because it depends on the content being indexed (# of unique words, ratio of file length to number of files etc), and also the types of searches being performed (wildcards are hardest-work, down to single keywords being the least-work). There really isn't a substitute for trying it yourself, but my recommendation would definitely be to cull as much noise as you can (regardless of whether the engine can handle the data with the noise or not).
Like I say, if you think there is content you can programmatically remove from the log files, let me know and I'll help you write the plug-in. Another option, could be to remove all success type messages, perhaps?
Best
Jim
-your feedback is helpful to other users, thank you!