Indexing is a fairly straightforward process using ZendSearch\Lucene. All we need is to create documents with fields and values, and keep adding the document to the index. You can also remove documents, update documents, and clear an index. The following classes are used in index generation:
◆ Field – The ZendSearch\Lucene\Document\Field class allows users to define
a new document field; this field can be classified into one the following types:
‰ Field::keyword($name, $value, $encoding = 'UTF-8'): the keyword field type is used to identify string fields that don't have to be tokenized, yet need to be indexed and stored. For example, date and URL.
‰ Field::unIndexed($name, $value, $encoding = 'UTF-8'): The unIndexed field type is used to store fields in the index without having to index/tokenize them. For example, ID fields.
‰ Field::binary($name, $value): The binary field type is used for storing binary values in the index.
‰ Field::text($name, $value, $encoding = 'UTF-8'): The text field type is the most common field type used for describing short strings which are tokenized and stored in the index.
‰ Field::unStored($name, $value, $encoding = 'UTF-8'): The unStored field type is used to identify fields that will be tokenized and indexed, but not stored in the index.
◆ Document – The ZendSearch\Lucene\Document class allows definition of a new index document. Some of the most commonly-used methods in this class are described as follows:
‰ addField(Document\Field $field): Adds a new field to the document
‰ getFieldNames(): Used to retrieve all field names from the document
‰ getField($fieldName): Used to retrieve a specific field from
the document
‰ getFieldValue($fieldName): Used to retrieve a specific field value from the document
◆ Index – Index can be retrieved using the create() and open() methods in the ZendSearch\Lucene class. Both the methods take the index path
as the parameter and return an index of type ZendSearch\Lucene\ SearchIndexInterface. The SearchIndexInterface provides
the following methods for manipulating the documents inside the index:
‰ addDocument(Document $document): Adds a new document to
the index
‰ delete($id): Deletes the indexed document based on the internal document ID
‰ optimize(): Helps in optimizing the index, by merging all segments into a single segment, thereby increasing the performance
‰ commit(): Used to commit transactions to the search index
Now that we have learned about the methods that are used for index generation, let's get started and generate the index for the uploads table that is available in our communication application.
Perform the following steps for generating a Lucene index:
1. Create a new search controller, CommunicationApp/module/Users/src/ Users/Controller/SearchController.php, which will be used for searching and generating indexes.
2. Add references to ZendSearch\Lucene:
use ZendSearch\Lucene;
use ZendSearch\Lucene\Document; use ZendSearch\Lucene\Index;
3. Add a method to fetch the index location from the module configuration:
public function getIndexLocation()
{
// Fetch Configuration from Module Config
$config = $this->getServiceLocator()->get('config'); if ($config instanceof Traversable) {
$config = ArrayUtils::iteratorToArray($config);
}
if (!empty($config['module_config']['search_index'])) { return $config['module_config']['search_index'];
} else {
return FALSE;
}
}
4. The index document needs to be generated in the following format:
Index
field Description
upload_id This is non-indexed field which will be used for retrieving
the uploaded file that gets returned
in the search results
label This field is used to
index the label field of the uploads table
owner This field is used to
index the name field of the user who uploaded the document
5. Create a new action to generate the index:
public function generateIndexAction()
{
$searchIndexLocation = $this->getIndexLocation();
$index = Lucene\Lucene::create($searchIndexLocation);
$userTable = $this->getServiceLocator()->get('UserTable');
$uploadTable = $this->getServiceLocator()->get('UploadTable');
$allUploads = $uploadTable->fetchAll(); foreach($allUploads as $fileUpload) {
//
$uploadOwner = $userTable->getUser($fileUpload->user_id);
// create lucene fields
$fileUploadId = Document\Field::unIndexed( 'upload_id', $fileUpload->id);
$label = Document\Field::Text(
'label', $fileUpload->label);
$owner = Document\Field::Text(
'owner', $uploadOwner->name);
// create a new document and add all fields
$indexDoc = new Lucene\Document();
$indexDoc->addField($label);
$indexDoc->addField($owner);
$indexDoc->addField($fileUploadId);
$index->addDocument($indexDoc);
}
$index->commit();
}
6. Now open the action URL (http://comm-app.local/users/search/ generateIndex) in your web browser, and if everything works as expected, you will see that the index files that created in thesearch_index folder.
The following screenshot shows the browser response upon a successful index update:
You can see in the following screenshot that the index files are generated and stored in the search_index folder:
What just happened?
Now we have created a method to index the data stored in the MySQL table to the Lucene data store; our next step will be to have some queries executed against the Lucene index and to fetch and show the results.
Không có nhận xét nào:
Đăng nhận xét