Links

The Query flow

Hyperspace provides unprecedented search latency, suitable for true real-time application. This level of performance is the results of hyperspace SPU virtual chip - search processing unit, a domain specific architecture specially designed for search application. this is the beauty of Hyperspace technology, HW level speed, SW level flexibility and all delivered as cloud SaaS.
Hyperspace stores documents, millions or even billions of documents. search is a process of finding few documents that meets user intent (usually specified in a query) followed by ranking phase over relevance measure (aka score).

Space Reduction - filtering

Given the user query, one can go over all collection documents while evaluating the expression score(i) = user_query(Di) and then return the top K documents. this is of course a naïve way that does not scale, as one can not go over all the documents in dataset/collection for each query. to overcome this problem, one need some way to reduce the search space dramatically, ie. from all dataset documents down to thousands or even hundreds documents (aka filtering), so that user_query(Di) evaluation is done on small fraction of the dataset. this is called space reduction and the reduced group of documents is called candidate group. once we reduce the search space we can easily evaluate the expression score(i) = user_query(Di) over this space and return the K top matching document. in the next section we will learn how filtering and scoring is specified.

Query Python interface

The following code snippets show a basic query in Hyperspace python interface. the query is self explanatory, nevertheless, it is important to understand the way space reduction - filtering work, in any query, the engine use the "outer if" expression as a condition to gather the candidate group, hence, only documents where the "email domain" fields matches "yahoo.com" will go to the candidate group. so the system will evaluate the inner logic only on documents of this group. in this case line 3 is the outer if, where lines 4-11 form the inner logic.
1
def user_query(Q,D) :
2
3
if match("email domain", "yahoo.com") :
4
5
score0 = 3.4
6
if match("first name", "Jhon") :
7
score0 += 7.0
8
else :
9
score0 -= 1.2
10
11
return score0

Query DSL interface

DSL interface is currently under development where the first version expected on Q1 24. the following code snippets shows the same query in Elastic DSL interface format. clearly, even without considering the additional functionality, the python syntax is much simpler and readable than the DSL syntax.
1
{
2
"query": {
3
"bool": {
4
"must": [
5
{
6
"constant_score": {
7
"filter": {
8
"term": {
9
"email domain": "yahoo.com"
10
}
11
},
12
"boost": 2.2
13
}
14
}
15
],
16
"should": [
17
{
18
"constant_score": {
19
"filter": {
20
"term": {
21
"first name": "John"
22
}
23
},
24
"boost": 8.2
25
}
26
}
27
]
28
}
29
}
30
}

The Query logic & Query Document concept

often, in applications other then free text search, queries share the same logic but with different parameters. There are many instances of the above query that differs in the values in-place of "yahoo.com" and "Jhon ", in this case, we can talk about Query logic and Query document, where, only Query document is changing while Query logic remain stable.
This is a very important result, as it saves a query logic compilation step that typically comes before query execution, and therefore yields dramatic reduction in latency
the following code snippets shows the Query logic and the Query documents derived from the above user query.

Query logic

1
def user_query_logic(Q,D) :
2
if match("email domain") :
3
score0 = 3.4
4
if match("first name") :
5
score0 += 7.0
6
else :
7
score0 -= 1.2
8
return score0

Query logic

1
user_query_document = {
2
"email domain" : "yahoo.com",
3
"first name" : "John"
4
}

TF-IDF

Hyperspace supports TF/IDF scoring for keyword field type. this scoring method, emphasis rare terms compared to common terms, for instance, depending on the user query logic, documents with "gmail.com" in email_domain field, will appear lower (less relevant) than documents with "yahoo.com" in email_domain field. the idea behind this states that "gmail.com" filter is a weak filter as all world use gmail. (more on that in scoring section)