Happily, Solr also plays nicely with Drupal. So my colleagues want to connect their Drupal application to the installed Solr server (see my last blog entry). Drupal provide two modules for Solr server integration. The modules links external Solr server with the Drupal application, passing data into Solr to index, and then enabling Drupal to serve up the search results. As of the writing of this how-to two modules have an advantage and a disadvantage. The first is Search API Solr search with the advantage you can use the Solr Index straight in views. The disadvantage is you can not index files remotely on the Solr server https://drupal.org/node/1289222. The second module is Apache Solr search integration with the advantage you can index file attachments remotely on the Solr server. The disadvantage is you can not use the search index for views.

Okay however which module we choose, the installation of the configuration files are the same.

1
2
wget http://ftp.drupal.org/files/projects/apachesolr-7.x-1.3.tar.gz
tar -zxvf apachesolr-7.x-1.3.tar.gz

Next I have to copy the module configuration files from /solr-conf/solr-4.x/* to /opt/apache-solr-4.3.1/my-app/solr/collection1/conf/. So I restart the Solr server to check if there any log messages appears. In my case, currently the solrconfig.xml do not know the “collection1” directory, so the relative path of the “../../contrib” directory is wrong and must be set one level up to “../../../contrib” in the solrconfig.xml.

Next my colleagues install the module in Drupal and setting up the Solr server URL: http://X.X.X.X:8983/solr/collection1

Apache Solr Attachments

The Apache Solr attachments module lets you extract documents using Solr server. For that I need to configure Solr accordingly and edit the solrconfig.xml to have these lines:

<lib dir="../../../contrib/extraction/lib" />
<lib dir="../../../dist/" regex="solr-cell-\d.*\.jar" />
 
<!-- An extract-only path for accessing the tika utility -->
<requestHandler name="/extract/tika" class="org.apache.solr.handler.extraction.ExtractingRequestHandler" startup="lazy">
  <lst name="defaults">
  </lst>
  <!-- This path only extracts - never updates -->
  <lst name="invariants">
    <bool name="extractOnly">true</bool>
  </lst>
</requestHandler>

https://drupal.org/node/1205190
Note: For locally extracting documents I need the Apache Tika app on the Drupal webserver.