Hyperspace Query Flow
Candidate Generation and Scoring
Database search is the process of retrieving the documents that best meet the query conditions. This flow is comprised of two main steps:
Space Reduction and Candidate Generation: The search space is reduced, and a list of candidate documents that passed the query filtering is generated.
Candidate Ranking: Candidates are ranked according to a score that corresponds to how well that match to the query.
Given a query, the naïve approach is to consider all the documents in the collection, while evaluating the expression score(i) = user_query(Document_i) and return the top K matching documents. However, this approach does not scale because it is impractical to review all the documents in the collection for each query. To overcome this problem, one needs some way to reduce the search space dramatically, from all dataset documents down to thousands or even hundreds of documents, so that user_query(Di) evaluation is only performed on a small fraction of the dataset. This is called space reduction or filtering, and the reduced group of documents is called the candidate group. Once the search space is reduced, it is easy to evaluate the score per document over this space and to return the K top matching documents. The next sections describe how filtering and scoring are specified in DSL syntaxes.
Query DSL Interface
The following code snippet shows the same query as in the Elastic DSL interface format, shown above. Even without considering the additional functionality, the Python syntax is much simpler and more readable than the DSL syntax.
TF-IDF
Last updated