Solving Dynamic Minimum Should Match Problem with terms_set Query

Haydar Külekci
3 min readSep 9, 2023

--

Photo by Brett Jordan on Unsplash

terms_set query is handy for defining a dynamic minimum_should_match for a term query. For example, you have an index as job-candidates , and you are storing candidate information for the positions in the index. At some point, you want to search for candidates and there will be some requirements. Your requirements could be a standard requirement or you can even create a specific requirement per candidate. So, for example, you try to match candidates and jobs according to programming languages. For some candidates, you want to keep just one match but for some candidates, you want to look for a strong match. Let’s create the index first and try to understand with samples:

PUT /job-candidates
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"programming_languages": {
"type": "keyword"
},
"required_matches": {
"type": "long"
}
}
}
}

In the index, as you can see, there are three fields which are name , programming_languages , and required_matches . Here, the required_matches field will be our criteria while searching. We will check this field to understand how many programming languages need to match to say whether this candidate will match or not. So, let’s index some documents to index:

PUT /job-candidates/_doc/1?refresh
{
"name": "Jane Smith",
"programming_languages": [ "c++", "java" ],
"required_matches": 2
}
PUT /job-candidates/_doc/2?refresh
{
"name": "Jason Response",
"programming_languages": [ "java", "php" ],
"required_matches": 2
}
PUT /job-candidates/_doc/3?refresh
{
"name": "Jason Request",
"programming_languages": [ "rust", "php" ],
"required_matches": 1
}
PUT /job-candidates/_doc/4?refresh
{
"name": "Mason Difficult",
"programming_languages": [ "javascript", "php", "html"],
"required_matches": 3
}

After that, we can execute some searches to find the correct match. For example, I am looking for a candidate who has skills in PHP, C++, or Java. But we need to consider the required_matches field value to say the candidate matches. So, let’s build our query.

GET /job-candidates/_search
{
"query": {
"terms_set": {
"programming_languages": {
"terms": [ "c++", "java", "php" ],
"minimum_should_match_field": "required_matches"
}
}
}
}

According to the query, documents 1, 2, and 3 will be in our resultset. If we change the query as follows:

GET job-candidates/_search
{
"query": {
"terms_set": {
"programming_languages": {
"terms": ["java", "php"],
"minimum_should_match_field": "required_matches"
}
}
}
}

Document 1 doesn’t match anymore because it requires 2 matches and has Java and C++. And in the query, we don’t have C++.

So, you index the following document without required_matches field. Or you have already created an index, and no field defines a requirement.

PUT /job-candidates/_doc/5?refresh
{
"name": "Mason Difficult",
"programming_languages": [ "java", "php"]
}

Whenever you execute the previous search, you will see that this document won’t match with your criteria. In fact, the candidate could be a good fit. But somehow, we lose our business logic because of missing fields. Let’s try to fix this by using a painless script within minimum_should_match_script field of the terms_set query:

GET /job-candidates/_search
{
"query": {
"terms_set": {
"programming_languages": {
"terms": [ "java", "php" ],
"minimum_should_match_script": {
"source":
"""
if (doc['required_matches'].size() != 0) {
params['default_requires'] = doc['required_matches'].value
}
return params['default_requires'];
""",
"params": {
"default_requires": 2
}
}
}
}
}
}

Let me try to explain what the script is doing exactly.

— Set a default value for minimum requires
— Check whether the field exists or not for required_matches.
— Overwrite default value of minimum required if the field doesn’t exist
— Return the default value or overwritten value of requires

As a result, with the terms_set query, we can customize our search criteria according to the data itself. Even with a script, we can specify more complex algorithms for minimum_should_match.

Did you like the article? 👏 Could the article be interesting to others? 🙌 Please don’t hesitate to applaud and share the article on social media.

Lastly, don’t forget to follow me on social media 🔥 and here 🤙.

👉 https://twitter.com/kulekci
👉 https://www.linkedin.com/in/hkulekci/

--

--

Haydar Külekci
Haydar Külekci

Written by Haydar Külekci

Elastic Certified Engineer - Open to new opportunities & seeking sponsorship for UK/Netherland relocation 🇳🇱🇬🇧 https://www.linkedin.com/in/hkulekci/

No responses yet