Thứ Năm, 24 tháng 4, 2014

Indexing [Search Using Lucene]

Indexing is a fairly straightforward process using ZendSearch\Lucene. All we need is to create documents with fields and values, and keep adding the documento the index. You can also remove documents, update documents, and clear an index. The following classes are used in indegeneration:

       Field – The ZendSearch\Lucene\Document\Field class allows userto define
a new document field; this field can be classified into one the following types:
        Field::keyword($name, $value, $encoding = 'UTF-8'): the keyword field type is used to identifstring fields that don't havto be tokenized, yet need to be indexed and stored. For example, date and URL.
        Field::unIndexed($name, $value, $encoding = 'UTF-8'): The unIndexed field type is used tstore fields in the index without having to index/tokenize them. For example, ID fields.


        Field::binary($name, $value): The binary field type is used for storing binarvalues in the index.
        Field::text($name, $value, $encoding = 'UTF-8'): The text field type is the moscommon field type used for describing short strings which artokenized and stored in the index.
        Field::unStored($name, $value, $encoding = 'UTF-8'): The unStored field type is used to identify fields that will be tokenized and indexed, but not stored in the index.

        Document – The ZendSearch\Lucene\Document class allows definition of a new index document. Some of the moscommonly-used methods in this class are described as follows:
        addField(Document\Field $field): Adds a new field to the document
        getFieldNames(): Used tretrieve all field names from the document
        getField($fieldName): Used tretrieve a specific field from
the document
        getFieldValue($fieldName): Used tretrieve a specific field value from the document

        Index – Index can be retrieved using the create() and open() methods in the ZendSearch\Lucene class. Both the methods take the index path
as the parameter and return an index of type ZendSearch\Lucene\ SearchIndexInterface. The SearchIndexInterface provides
the following methods for manipulating the documents inside the index:
        addDocument(Document $document): Adds a new documento
the index
        delete($id): Deletes the indexed document based on the internal document ID
        optimize(): Helps in optimizing the index, by merging all segments into a single segment, thereby increasing the performance
        commit(): Used tcommit transactions to the search index

Now thawe have learned about the methods that are used for indegeneration, let's gestarted and generate the indefor the uploads table that is available in our communication application.
Perform the following stepfor generating a Lucene index:

1.       Create a new search controllerCommunicationApp/module/Users/src/ Users/Controller/SearchController.php, which will be used for searching and generating indexes.
2.       Add references tZendSearch\Lucene:
use ZendSearch\Lucene;
use ZendSearch\Lucene\Document; use ZendSearch\Lucene\Index;
3.       Add a method tfetch the index location from the module configuration:
public function getIndexLocation()
{
// Fetch Configuration from Module Config
$config  = $this->getServiceLocator()->get('config'); if ($config instanceof Traversable) {
$config = ArrayUtils::iteratorToArray($config);
}
if (!empty($config['module_config']['search_index'])) { return $config['module_config']['search_index'];
} else {
return FALSE;
}
}

4.       The index document needs to be generated in the following format:
Index field               Description
upload_id    This is non-indexed field which will be used for retrieving the uploaded file thagets returned in the search results
label        This field is used to index the label field of the uploads table
owner        This field is used to index the name field of the user who uploaded the document
5.       Create a new action tgenerate the index:
public function generateIndexAction()
{
$searchIndexLocation = $this->getIndexLocation();
$index = Lucene\Lucene::create($searchIndexLocation);

$userTable = $this->getServiceLocator()->get('UserTable');


$uploadTable = $this->getServiceLocator()->get('UploadTable');
$allUploads = $uploadTable->fetchAll(); foreach($allUploads as $fileUpload) {
//
$uploadOwner = $userTable->getUser($fileUpload->user_id);

// create lucene fields
$fileUploadId = Document\Field::unIndexed( 'upload_id', $fileUpload->id);
$label = Document\Field::Text(
'label', $fileUpload->label);
$owner = Document\Field::Text(
'owner', $uploadOwner->name);

// create a new document and add all fields
$indexDoc = new Lucene\Document();
$indexDoc->addField($label);
$indexDoc->addField($owner);
$indexDoc->addField($fileUploadId);
$index->addDocument($indexDoc);
}
$index->commit();
}

6.       Now open the action URL (http://comm-app.local/users/search/ generateIndex) in your web browser, and if everything works as expected, you will see that the index files that created in thesearch_index folder.
The following screenshot shows the browser response upon a successful index update:


You can see in the following screenshot that the index files argenerated and stored in the search_index folder:


What just happened?
Nowe have created a method to index the datstored in the MySQL table to the Lucene data store; our nexstep will be to have some queries executed against the Lucene index and tfetch and show the results.

Không có nhận xét nào:

Đăng nhận xét