The main areas of current Internet search engine service providers such as Yahoo, Baidu, Google, etc., to provide users with vast amounts of information are laterally search. In the Internet constantly updated and evolution stage, we found that: ordinary Internet users want to find the necessary information simply as a haystack, a flood of information is no longer the main driving force of development, awareness and timeliness is the real driving force.

Development of the Internet is no longer about whether the rapid, large numbers and transmit information to the user, but the user can achieve the desired time, the desired location in a desired manner and cost to obtain the desired information. However, a lot of information to meet the comprehensive search engine transverse search, but difficult to juggle search accuracy and relevance of quality. Integrated search engines value lies in doing a lot of information navigation, information needs for the relative concentration of classification lacks industrial customers in more detail oriented. To solve this problem as searching the opportunity to develop future research institutions has also become a hot research race. This new search vertical search pattern is generated in this context.

This research work is divided into two main parts: the first part through theoretical analysis proposed information collection for the vertical search engine algorithm ideas; By the second part of the core technology of vertical search engine to analyze, design and implement a vertical search engine prototype system. Body part consists of five chapters on research in detail. The first chapter section details the history of the development of search engines, pointing out that the current comprehensive search engine as well as the problems faced ways to solve these problems, that is studied in this paper orientation: vertical search engines.

By and comprehensive search engine key technologies in information services and comparative analysis, pointing out that there is a vertical search engine huge advantage and development space. Finally, the analysis of the vertical search engine development at home and abroad as well as proposed in this paper to solve the problem.

Chapter overall architecture and information collection section gives the overall architecture of vertical search engines and workflow design, and vertical search engine to analyze their own characteristics. In addition, information collection strategy gives a popular model of information collection and analysis of the current generic information collection algorithm based on vector space model is the core idea of the similarity matching algorithm and deficiencies. Finally, through the body presentation, proposed to build a knowledge base based on ontology information collection policy implementation of intelligent ideas to solve the information acquisition process polysemy and a multi word justice issues.

Chapter Lucene framework research component of the current best open source Lucene full-text retrieval framework for a detailed analysis. Including full text search technology introduction, Lucene project sources and constitute the framework of the presentation and provided Lucene indexing and search capabilities very important inverted index technology and the introduction of scoring mechanism, and gives the indexing and search implementation The core program code. Finally, it describes the Chinese word segmentation and carve the word Lucene implementation principle.

Chapter vertical search engine to achieve some combination Hertrix reptiles and Lucene open source framework to design and build a mobile-oriented product information vertical search engine prototype system. The system is implemented in three parts, the first part of the framework of realization of the information based on Heritrix collection function and design a structured information extraction procedure. The second part of the design of the product information for mobile phones segmentation tool, and to use Lucene framework to achieve a structured text index. The third part of the design of the query interface based on the MVC architecture, and implements a prototype system search function. So as vertical search engine technology to achieve levels provide useful lessons and guidance.

Chapter Summary and Outlook section with a summary of the work on this article, and made a vertical search engine trends and continue to study a number of directions. Search field has a famous saying: “Users can not describe to know what he was looking for, except to let him see are looking for.” Microsoft Research has a technical expert, said: “75% of the contents of the general search engines do not come out.” Vertical search engines as a search engine technology development direction of a branch, the Internet users search tendency simply want to search from the beginning to the search for comprehensive comprehensive search accuracy rate increase and enhance the transfer of information in the inevitable result of aging.

Also, vertical search engines through the industry in the field of information model and user model structured collection or re organization, will provide more and more professional, personalized services industry, compared with the traditional integrated search, the more smart and more humane. Therefore, the vertical search engine market has its necessity and broad prospects for development, however, vertical search as a fledgling new technology, there are many areas for improvement and breakthrough, the paper vertical search engine technology research will the development of vertical search practical guiding significance.

