Elasticsearch is the way

Don’t get me wrong, i love Apache Solr, i think it’s a wonderful project and the versions 4.x are definitely something you should check out when building a proper search engine.

But Elasticsearch, at least for me, is now the way to the future. If you need a few reasons why, read on :

Out of the box scalability

SolrCloud is doing a good job trying to get Solr into the Cloud era, because even if Solr supported distributed query before, sharding had to be done manually…

Elasticsearch scalability is so easy it’s a bit frightening, every time i set up a new elasticsearch « single » server i deactivate as soon as possible the cluster-search capability, just in case it starts replicating the internet on my machine !  Sharding/Replication is automatic and almost a necessity, because your server (by default) will remind you that you’re a dangerous person keeping all your data on a single machine, and will stay in a yellow state until you start adding some nodes !

Comprehensive Json-based HTTP search API

In all honesty sometimes the json-based search queries can become quite complicated and tedious to read, but it’s much more powerful than a simple ?q=….  query or the long and complicated list of URL-GET parameters you end up using with Solr… So even if there are no proper Chrome extension to create a GET HTTP request with a JSON body (!! add a comment if you find one !!), i still think it’s a blessing to have that kind of query capacity, and it made me rethink about elasticsearch’s tolerance/suitability for complex query (c.f.  the « As complex as Solr » part).

Rivers…

Probably one of the best feature of Elasticsearch, it’s designed around the fantastic (and true) idea that an Elasticsearch index needs to be fed !

Just this concept changes everything, because it makes the « realtime index » the default type of index, because anyway nowadays what matters most is to have an up-to-date search index and it’s a fact that Near-Realtime search is one the many advantages that makes Solr and Elasticsearch the best choices out there.

Vibrant community and plugins

Probably the most important part, in my opinion, i do think that the Solr ecosystem lacks a lot of good tools and plugins to leverage more of its power. Luke is a pretty useful tool, but it’s very lucene-centric, apart from the solr-provided tools (which are, i must say, sufficient for a lot troubleshooting and debugging). I’ve been on Solr 3.x for a long time, and even if all the tools where there, the UI certainly lacked in terms of « sexy », nowadays Solr 4.x’s UI is certainly more sexy and a pleasure to work with, but it’s still only the work of Lucidworks.

Elasticsearch is brand new, the documentation is sexy, the project is sexy, they built a wonderful plugin system that uses github directly !! You don’t have to be a fully accredited « Elasticsearch-compliant plugin creator » to publish your project.

So a lot of people created wonderful plugins, that already goes beyond what you can use in the Solr/Lucene world, just a quick review :

  • Paramedic : a « simple and sexy tool to monitor and inspect elasticsearch clusters »;
  • Head : « A web front end for an ElasticSearch cluster » with a real-time dashboard;
  • BigDesk : Live charts and statistics for Elasticsearch cluster;
  • For the analysis, you have Inquisitor

    to help understand and debug your queries in ElasticSearch and SegmentSpy to watch real time segments merging and changing.

This is just the state of the art right now, but i can’t imagine it going anywhere but forward.

As complex as Solr

Finally, i had prejudiced, because i thought that the goals of Elasticsearch in terms of scalability where clearly ambitious (and deeply needed !), but that this kind of scalability obviously came at a cost and therefor there would be less features than what Solr offered (ex. Dismax queries).

But i was wrong, as i discovered recently that Dismax queries, fuzzy matching and other goodies allowing many things from boosted-field at query time to boosted sub-queries, are available and easily accessible thanks to the Elasticsearch API. So the proper section-name should not be « As complex as Solr » but « As versatile as Solr ».

I hope i made my point, and if you’re considering building a BigData-ready search engine right now, make sure to check out Elasticsearch or you’ll be missing out on a great product.

Vale

Publicités

3 Commentaires

  1. Fabrice · · Réponse

    There is a Chrome extension to create HTTP requests with a json body and more (syntax highlighting, autocomplete, formatting and code folding): https://chrome.google.com/webstore/detail/sense/doinijnbnggojdlcjifpdckfokbbfpbo

    1. Wonderful ! thanks i’ll go check it out right now !

  2. Same opinion here.

    I think the ElasticSearch River is elegant compared to the Solr DataImportHandler, and is more friendly to index data from the NoSQL world.

    The ES client is a LOT better than SolrJ client:
    – Much more typesafe and structured
    – Future results
    – Well documented

    Solr is powerfull but I think it’s quite boring to use. I felt its technical dept, and didn’t find the source very readable compared to ES.
    On the client side, you end up creating manually all the queries by concatenating strings, while ES offers a very clean and powerfull DSL.

Laisser un commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l'aide de votre compte WordPress.com. Déconnexion / Changer )

Image Twitter

Vous commentez à l'aide de votre compte Twitter. Déconnexion / Changer )

Photo Facebook

Vous commentez à l'aide de votre compte Facebook. Déconnexion / Changer )

Photo Google+

Vous commentez à l'aide de votre compte Google+. Déconnexion / Changer )

Connexion à %s

%d blogueurs aiment cette page :