转帖|使用教程|编辑:龚雪|2014-12-15 15:03:15.000|阅读 597 次
概述:本教程介绍了在dtSearch中,如何实现搜索结果命中关键字高亮显示。
# 慧都年终大促·界面/图表报表/文档/IDE等千款热门软控件火热促销中 >>
What do you do with your search results after you have obtained them? We explore hit highlighting with dtSearch and C#.
In the first part of my exploration of the search and indexing system dtSearch, I covered the basic principles of operation. Now we consider what to do next once you have some search results.
What do you do with your search results after you have obtained them?
It is a good question. In many cases it is enough to simply list the files that contain the hits. But what if your users want to look inside the files and see where the hits have occurred? This is a nightmare of a job if you have to start from scratch. All those file formats and then there is the bother of finding out how to highlight the hits in each format.
No - it probably isn't worth the effort.
The good news is that if you are using dtSearch, which you can try for yourself by downloading the 30-day evaluation from dtsearch download page, you can use a range of file and container parsers. dtSearch has its own file parsers supporting popular file types like MS Word, Excel, Access, PowerPoint, CSV, HTML, PDF, XML/XSL, emails and attachments, ZIP files, etc. The file parsers are used by the indexing engine to look inside each document and it is also used by the FileConverter object to allow you to process documents into a standard format so that the results of searches can be presented to users.
It is assumed that you already know how to create an index and search it using say C#. If not read Getting started with dtSearch, but to give us something to work with we will perform a simple search for the single word "Jeep" in an index that has already been constructed.
First we need a SearchJob object:
SearchJob SJob1 = new SearchJob();
which we make ready to perform the search on the index:
SJob1.BooleanConditions = "Jeep"; SJob1.IndexesToSearch.Add(@"location of index"); SJob1.MaxFilesToRetrieve = 100;
You can, of course, customize to search for any target in your own index. Finally we execute the search:
SJob1.Execute();
This populates a Result object with all of the information needed about each file found. To make things easier to follow, we can create a direct reference to the results of the search:
SearchResults Results = SJob1.Results;
The search results contains a list of documents and associated properties that was returned from a search. We have encountered it before, but is is worth a few moments considering it in more detail.
You might expect SearchResults to be a simple collection object, but if you consider the problems involved in creating a collection object consisting of a range of documents you can see that it is better to only retrieve a document when it is needed. To retrieve a document and its associated properties you simply use the GetNthDoc method, which loads the Nth document in the results list into the SearchResults objects CurrentItem property.
For example, to get the first document you would use something like:
Results.GetNthDoc(0); SearchResultsItem Item=Results.CurrentItem;
Once you have the Item you can make use of its properties to work with the properties of the document and the search. For example, you can check the document's type using its TypeId property. This is an integer but its ToString method has been overridden to provide a string identifier.
So, for example:
MessageBox.Show(Item.TypeId.ToString());
displays
which indicates that the first document in the list is a PDF. You can discover the full range of document types by examining the TypeId enumeration.
As well as details of the document, the SearchItem object also contains details of the hits, i.e. the details of where the target phrase was found in the document. You can use these details to manipulate the document to show, say, where the hits occurred.
A central issue is locating where a hit occurred and this can be achieved using the Hits array which contains the offsets of the words that have been matched to the search target. If you examine the range of similar properties then you should be able to see how to highlight the hits - but you probably aren't looking forward to such detailed, and let's face it boring, coding. The good news is that it has all been done for you and in a way that does so much more than you would achieve with a basic approach.
The central object in making hit highlighting and other similar tasks very easy indeed is the FileConverter. This takes a file in any of the supported formats, and there are a lot of supported formats. and performs a transformation on it to HTML, RTF, XML or plain text. Just this feat alone is worth its weight in code but it will also "decorate" the conversion with markers that can be used to highlight the hits.
The first thing to say is that FileConverter is general and will process a document even if you acquired it by some complicated route. All you have to do is set its properties correctly and call its Execute method and the job is done.
The properties that you have to set are also fairly simple: the name of the file to process, a hits array giving the offsets from the start of the file of each of the hits, a specification of what characters constitute a word break, the index that the file was retrieved from and the document id. You also have to specify what format you want the results in and what strings you want to use to mark up the hits.
If you want to process a general file then you need to specify the file name as Inputfile or it the data is in a memory buffer then use InputBytes.
You can do this job one property at a time and it isn't difficult but if the file has been returned as the result of a search then it is even easier. The SetInputItem method can be used to set all of the necessary properties from a SearchResults object. For example:
FileConverter fc=new FileConverter(); fc.SetInputItem(Results,0);
creates a FileConverter and initializes it so as to be ready to process the first document in the results. After this the only things we need to set are the output format required, e.g. HTML, RTF, XML or plain text, and the strings to be inserted before and after each hit. For example to create an HTML document and surround every hit with an <h1> tag pair (an odd choice but still possible) you would use:
fc.OutputFormat = OutputFormats.itHTML; fc.BeforeHit = "<h1>"; fc.AfterHit = "</h1>";
Almost ready to highlight, or headline in this case, but first we need to set where the output is going. The FileConverter can either create a new file or it can return a string. In this case we will opt to display the HTML in a webBrowser control and so a string is most appropriate. Place a webBrowser control on the form and store the result of the highlighting in it using
fc.OutputToString = true; fc.Execute(); webBrowser1.DocumentText = fc.OutputString;
If you now put all this together you will find that every occurrence of the target search item is set as an <h1> headline.
In most cases it would be better to use a color code to highlight the hits but you really are free to do whatever you want with the text that constitutes the hits.
A really nice touch is the provision of the special tags - %%ThisHit%%, %%NextHit%% and %%PreviousHit%% which causes the FileConverter to place numbers into the file corresponding to the index of the hit in the array. With a little work this can be used to create hyperlinks that the user can use to navigate the document.
Also notice that while the output of the conversion is HTML (or RTF, or XML or plain text) the input file can be in any of the supported formats, making FileConverter a one-stop solution to highlighting hits in almost any type of file in the index.
This is almost too easy.
原文地址://www.i-programmer.info/programming/database/3153-highlighting-search.html
本站文章除注明转载外,均为本站原创或翻译。欢迎任何形式的转载,但请务必注明出处、不得修改原文相关链接,如果存在内容上的异议请邮件反馈至chenjj@pclwef.cn
文章转载自:慧都控件网