solr

Hadoop

solr

딜레이라마 2017. 2. 16. 23:42

1. A Quick Overview

Having had some fun with Solr, you will now learn about all the cool things it can do. Here is a example of how Solr might be integrated into an application:

In the scenario above, Solr runs along side other server applications. For example, an online store application would provide a user interface, a shopping cart, and a way to make purchases for end users; while an inventory management application would allow store employees to edit product information. The product metadata would be kept in some kind of database, as well as in Solr. Solr makes it easy to add the capability to search through the online store through the following steps: Define a schema. The schema tells Solr about the contents of documents it will be indexing. In the online store example, the schema would define fields for the product name, description, price, manufacturer, and so on. Solr's schema is powerful and flexible and allows you to tailor Solr's behavior to your application. See Documents, Fields, and Schema Design for all the details. Deploy Solr to your application server. Feed Solr the document for which your users will search. Expose search functionality in your application. Because Solr is based on open standards, it is highly extensible. Solr queries are RESTful, which means, in essence, that a query is a simple HTTP request URL and the response is a structured document: mainly XML, but it could also be JSON, CSV, or some other format. This means that a wide variety of clients will be able to use Solr, from other web applications to browser clients, rich client applications, and mobile devices. Any platform capable of HTTP can talk to Solr. See Client APIs for details on client APIs. Solr is based on the Apache Lucene project, a high-performance, full-featured search engine. Solr offers support Apache Solr Reference Guide 6.0 13 for the simplest keyword searching through to complex queries on multiple fields and faceted search results. Sea rching has more information about searching and queries.

If Solr's capabilities are not impressive enough, its ability to handle very high-volume applications should do the trick. A relatively common scenario is that you have so much data, or so many queries, that a single Solr server is unable to handle your entire workload. In this case, you can scale up the capabilities of your application using So lrCloud to better distribute the data, and the processing of requests, across many servers. Multiple options can be mixed and matched depending on the type of scalability you need. For example: "Sharding" is a scaling technique in which a collection is split into multiple logical pieces called "shards" in order to scale up the number of documents in a collection beyond what could physically fit on a single server. Incoming queries are distributed to every shard in the collection, which respond with merged results. Another technique available is to increase the "Replication Factor" of your collection, which allows you to add servers with additional copies of your collection to handle higher concurrent query load by spreading the requests around to multiple machines. Sharding and Replication are not mutually exclusive, and together make Solr an extremely powerful and scalable platform. Best of all, this talk about high-volume applications is not just hypothetical: some of the famous Internet sites that use Solr today are Macy's, EBay, and Zappo's. For more information, take a look at https://wiki.apache.org/solr/PublicServers.

A Step Closer

You already have some idea of Solr's schema. This section describes Solr's home directory and other configuration options. When Solr runs in an application server, it needs access to a home directory. The home directory contains important configuration information and is the place where Solr will store its index. The layout of the home directory will look a little different when you are running Solr in standalone mode vs when you are running in SolrCloud mode. The crucial parts of the Solr home directory are shown in these examples:

You may see other files, but the main ones you need to know are: solr.xml specifies configuration options for your Solr server instance. For more information on solr.xm l see Solr Cores and solr.xml. Per Solr Core: core.properties defines specific properties for each core such as its name, the collection the core belongs to, the location of the schema, and other parameters. For more details on core.pro perties, see the section Defining core.properties. solrconfig.xml controls high-level behavior. You can, for example, specify an alternate location for the data directory. For more information on solrconfig.xml, see Configuring solrconfig.xml. managed-schema (or schema.xml instead) describes the documents you will ask Solr to index. The Schema define a document as a collection of fields. You get to define both the field types and the fields themselves. Field type definitions are powerful and include information about how Solr processes incoming field values and query values. For more information on Solr Schemas, see Doc uments, Fields, and Schema Design and the Schema API. data/ The directory containing the low level index files. Note that the SolrCloud example does not include a conf directory for each Solr Core (so there is no solrconf ig.xml or Schema file). This is because the configuration files usually found in the conf directory are stored in ZooKeeper so they can be propagated across the cluster. If you are using SolrCloud with the embedded ZooKeeper instance, you may also see zoo.cfg and zoo.data which are ZooKeeper configuration and data files. However, if you are running your own ZooKeeper ensemble, you would supply your own ZooKeeper configuration file when you start it and the copies in Solr would be unused. For more information about ZooKeeper and SolrCloud, see the section SolrCloud.

Solr Start Script Reference

Solr includes a script known as "bin/solr" that allows you to start and stop Solr, create and delete collections or cores, and check the status of Solr and configured shards. You can find the script in the bin/ directory of your Solr installation. The bin/solr script makes Solr easier to work with by providing simple commands and options to quickly accomplish common goals. In this section, the headings below correspond to available commands. For each command, the available options are described with examples. More examples of bin/solr in use are available throughout the Solr Reference Guide, but particularly in the sections Running Solr and Getting Started with SolrCloud.

reference : Solr Guide