What are your search engine choices?
If you are evaluating search engines for your project you will probably be considering our SearchUnit product against others such as Solr, ElasticSearch and Lucene. Below is an attempt to provide a frank look at the fundamental differences and likenesses of these search engines (and yes we have included weaknesses of our own product).Library versus Server
Perhaps the most fundamental difference to consider first is whether the engine is a library or a server. Your project may already be constrained to a library by its requirements, and it’s important to realize this as early as possible.
.NET Library - contained in a DLL, it runs as a part of your application. It is hosted by the .NET process that it is running in. Its DLL is bundled with your application.
Server - self-contained piece of software, it runs on the host operating system. Typically requires its own installer, and additional resources to run its own Java application server.
LIBRARIES | SERVERS | |||
---|---|---|---|---|
SearchUnit | Lucene.NET | ElasticSearch | Solr | |
Management GUI | Windows & Web | None | 3rd Party/Limited | 3rd Party/Limited |
Shared hosting OK | ✓ | ✓ | X | X |
Dedicated hosted OK | ✓ | ✓ | ✓ | ✓ |
Small footprint,(basic/initial,install requirements) | ✓ | ✓ | X | X |
OEM product inclusion, (ie can be bundled with a web application) | Easy | Easy | Harder (separate installer, or application server setup) | Harder (separate installer, or application server setup) |
API calls | Direct per standard .NET | Direct per standard .NET | Via HTTP request to server. RPC | Via HTTP request to server |
PDF, DOCX etc support | ✓ | X | ✓ | ✓ |
Azure support | Full | Full | Requires Azure VMs | Requires Linux HDInsight cluster |
Requires ASP.NET,(WebForms or MVC) | ✓ | ✓ | X | X |
To summarise, if your application is deployed to Azure, shared hosting or customer machines with an MSI, you will find a Library much easier to work with. If you have dedicated machines you can use a Library, or if there are the hardware specs necessary for a Server, then you have the option to install Solr or ElasticSearch too.
Weighing up SearchUnit against the alternatives
Features
Search engines have many many features and customers have customization requirements, all of which make comparisons long and dry. SearchUnit, ElasticSearch and Solr have a great deal of overlap – probably the biggest differentiators are:
Of the 4 above, Lucene does not have built in indexers for rich content such as pdf, MS Office etc.
SearchUnit, Lucene, ElasticSearch and Solr all support field based searching to various degrees but the latter 3 do have deeper support for field only searching.
Scaling
ElasticSearch, Solr and their commercial product derivatives are Enterprise grade search engine servers. They are capable of scaling beyond what SearchUnit currently can (it’s not easy to simply quantify that, but it’s in the ballpark of many millions of documents). SearchUnit can be hosted in server farm environments, however its indexes cannot be divided into sections (shards) that are hosted on separate machines. That means that if your project is scaling to the point where a single machine is unable to hold an entire copy of the search index (many millions of documents), SearchUnit will be unable to handle it.Implementation time/cost
SearchUnit has a simple intuitive GUI that the others do not. After running the installer you can have a searchable index running within 5 minutes, and have added search to your own pages in 10-20 more minutes.
Lucene is really an information retrieval system, and although it is suggested as a search engine solution it really does lack important features that a search engine requires. This is one reason that Solr and ElasticSearch (which are based on Lucene) exist.
Solr and ElasticSearch themselves have commercial wrappers, because they are not as straightforward to use as they could be. Implementing these open source engines fully will conservatively require 2 to 3 weeks, and necessitate ongoing maintenance, which puts their true cost in the high thousands.