Download NomenMatch code and put it to a web accessible folder, for example
/var/www/html/nomenmatch/
Download and run solr 4 (http://archive.apache.org/dist/lucene/solr/4.9.1/) with a core using schema.xml and solrconfig.xml in conf/solr-config
(It's should also work with other versions of solr with appropriate adjustment to schema.xml and solrconfig.xml)
cd to [extracted solr]/example/ and run
java -jar start.jar
The endpoint will be:
http://localhost:8983/solr
How to stop running:
ps aux | grep java
kill [pid]
That's it!
Edit conf/solr_endpoint, enter your solr endpoint without trailing newline () nor backslash (/), for example
http://localhost:8983/solr/taxa
Under workspace dir, run
php importChecklistToSolr.php {/path/to/source_data.csv} [source_id]
if [source_id] is empty, "source_data" will be used as the source_id
Tab seperated, see workspace/data/example.csv
Column definition: - namecode - accepted_namecode - scientific_name (full name or canonical form is ok) - name_url_id (the id which can be used to create a valid url to a taxon name page) - accepted_name_url_id (the id which can be used to create a valid url to an accepted taxon name page, if the name is a synonym) - family - order - class - phylum - kingdom
Edit conf/sources.example.csv and rename to sources.csv
Column definition:
- source_id - source_name - url_base (when combined with [accepted] name_url_id, it becomes valid url for the taxon, blah blah)
for example, - citation format - source data page - date (of source data fetched, downloaded, or created)
Under workspace dir, run
php clean_source.php {source_id}
To remove a specific source from solr, or run
php clean_source.php all
To remove all sources at once.
If this script doesn't work, usually it means java heap space out of memory. Try to restart solr and then run again.