Getting started with Azure Search

Mar 03, 2020

If you have any search background experience such as Lucene or Elastic Search, you will most likely be able to relate to how Azure Search works and the components that are provided to enable you to create a search service.

However, if you have not, fear not, it is not a pre-requisite, it just gives you a little head start.

Main components of Azure Search

Azure Search is made up of 3 main parts.

  • Data sources. As the name suggests, this is where the data originates. It can also be referred to as the raw data. Or in other words, where your original data exists and you want to somehow enable search capabilities. At the time of writing, Azure supports the following data sources: Azure SQL Server, CosmosDB both SQL and document db, Blob Storage and Table Storage. This means that if you have any data stored in these data sources, Azure offers "Indexers" that can import data into a search index. In this post, I'll show you how to create a basic datasource so that Azure Search knows where to import data from.
  • Indexes. It can also be referred to as a search index. It defines a schema for how data is organised and indexed. You can also define what can be searchable, filterable or sorted. Azure uses an inverted index to enable fast search lookups. We'll also cover how to create a search index in this post.
  • Indexers. This is a process, or a component that crawls the data source and loads the data into the defined search index. We'll go through how to create an Indexer. You can define how often this process runs such as daily, hourly, etc.

Inserting data into an Azure Search Index

If you have data that lives in Azure such as SQL Server or Cosmos, then you can easily use the inbuilt data sources from Azure.

If however, your data is hosted elsewhere, fear not. There is anothor way.

Pushing data. That's right. You can push data directly into an index using the REST API or SDK.

In this post, I will share how to use data sources and indexers, and in my next post, I shall share how to push data directly. There is no right or wrong method. It all depends on what you want to achieve and it can always be a combination of the two.

Important note
At the time of writing, the Azure Cognitive Search within the Azure Portal has some limited features. Do not be put off by this and is by no means any limitation of what it can offer. Many things can be achieved via the REST API or the Search SDK.

We'll start by going through each of the areas above, create a simple data source, and index an indexer and then we will have a basic Azure search index running.

All code can be found in my Github repository

Create a free Azure Search Service

Azure Search provides a free tier, which allows up to 3 data sources, indexes, indexers and 50MB of storage. You can only have one free Azure search instance per subscription. Oh, and you can't run an indexer within 180 seconds. Just a limitation of the free tier. But good enough to experiment with.

In the Azure Portal, go ahead and create a new Search service and be sure to set the relevant pricing tier.

When you have created an Azure Search service, make a note of the URL and the API key.

Here is the search URL.
azure-search-demo-url

This is the key you need to make a note of.
azure-search-demo-keys

In the examples below, I will be using a slightly modified Star Wars dataset which can found at GitHub. I will also be using the Azure Search REST API to create data sources, indexes and indexers.

Create a basic Data Source

First, let's load an Azure Cosmos database with some Star Wars characters. Here is a sample of data of what a Star Wars character looks like.

{
        "people": {
            "name": "Luke",
            "last_name": "Skywalker",
            "height": "172",
            "mass": "77",
            "hair_color": "blond",
            "skin_color": "fair",
            "eye_color": "blue",
            "birth_year": "19BBY",
            "gender": "male",
            "homeworld": "http://swapi.co/api/planets/1/",
            "films": [
                "http://swapi.co/api/films/6/",
                "http://swapi.co/api/films/3/",
                "http://swapi.co/api/films/2/",
                "http://swapi.co/api/films/1/",
                "http://swapi.co/api/films/7/"
            ],
            "species": [
                "http://swapi.co/api/species/1/"
            ],
            "vehicles": [
                "http://swapi.co/api/vehicles/14/",
                "http://swapi.co/api/vehicles/30/"
            ],
            "starships": [
                "http://swapi.co/api/starships/12/",
                "http://swapi.co/api/starships/22/"
            ],
            "created": "2014-12-09T13:50:51.644000Z",
            "edited": "2014-12-20T21:17:56.891000Z",
            "url": "http://swapi.co/api/people/1/",
            "desc": [
                "Luke Skywalker is a fictional character and the main protagonist of the original film trilogy of the Star Wars franchise created by George Lucas. The character, portrayed by Mark Hamill, is an important figure in the Rebel Alliance's struggle against the Galactic Empire. He is the twin brother of Rebellion leader Princess Leia Organa of Alderaan, a friend and brother-in-law of smuggler Han Solo, an apprentice to Jedi Masters Obi-Wan \"Ben\" Kenobi and Yoda, the son of fallen Jedi Anakin Skywalker (Darth Vader) and Queen of Naboo/Republic Senator Padmé Amidala and maternal uncle of Kylo Ren / Ben Solo. The now non-canon Star Wars expanded universe depicts him as a powerful Jedi Master, husband of Mara Jade, the father of Ben Skywalker and maternal uncle of Jaina, Jacen and Anakin Solo.",
                "In 2015, the character was selected by Empire magazine as the 50th greatest movie character of all time.[2] On their list of the 100 Greatest Fictional Characters, Fandomania.com ranked the character at number 14.[3]"
            ]
        }
    }

You can find the full sample json data here.

To bulk import this json document, you can refer to my blog post on how to bulk insert json documents into azure

Now, once we have loaded this into a new Cosmos DB collection, we can create a data source that references this collection.

Example Data Source POST request

To create a new data source, we will need to call the REST API datasources?api-version=[api-version]

This is a POST request to create a new Data Source.

Data source POST Request

POST https://[service name].search.windows.net/datasources?api-version=[api-version
Content-Type: application/json  
api-key: [key]  

Example POST request

POST https://demo-search-3jamlwx.search.windows.net/datasources?api-version=2019-05-06
Content-Type: application/json  
api-key: 8585F9E40DBADDDCCB8B03CC74EB2B0C

This is a json body example.

{
    "name": "starwars-characters-cosmos-datasource",
    "type": "cosmosdb",
    "credentials": {
        "connectionString": "AccountEndpoint=https://[service name].documents.azure.com:443/;AccountKey=[account key];Database=[database name]"
    },
    "container": {
        "name": "starwars-characters",
        "query" : "SELECT c.id, c.people.first_name, c.people.last_name, c.people.eyeColor, c.people.gender, c.people.height, c.people.eye_color FROM c"
    }
}

I'm using API version 2019-05-06 in the request. Microsoft has a list of the latest api versions on their website, so be sure to check it out.

The type value must be one of these values: 'azuresql', 'documentdb', 'azureblob', or 'azuretable'.
The credentials is the connection string to the cosmos database.
The container is the name of the table or collection or blob container. Since this is a cosmos database, it is the collection name.

There are additional parameters which can be found in the Azure documentation

You may notice there is a query section. This is to demonstrate how you can project or select specific data from the data source. The reason I am doing this is that the Star Wars json is encapsulated within a person property. To flatten this out, I've used this query, e.g. SELECT... c.people.first_name.

Create a Search Index

I like to think of this as a schema of how the data is structured for the index and also what will be returned from the search.

Let's say we want to search by a Star Wars character name, height, and gender as a start. We may want to structure our search data in the following format.

{
    "characterId": "adb81166-87d7-2eb3-d103-4ca6a84327c4",
    "firstName": "Wilhuff",
    "lastName": "Tarkin",
    "eyeColor": "blue",
    "gender": "male",
    "height": 180
}

It's worth noting that the search index schema does not necessarily have to match the same structure as the origin of the data source. Generally, it's structured based on the search criteria and what needs to be brought back from the search.

Example POST Index request

To create an index, we need to call the indexes POST request indexes?api-version=[api-version]

Index POST request

POST https://[servicename].search.windows.net/indexes?api-version=[api-version]  
Content-Type: application/json
api-key: [admin key]

Example POST request

POST https://demo-search-3jamlwx.search.windows.net/indexes?api-version=2019-05-06  
Content-Type: application/json
api-key: 8585F9E40DBADDDCCB8B03CC74EB2B0C

This is the json body.

{
    "name": "star-wars-characters-index",
    "fields": [
        {
            "name": "characterId",
            "type": "Edm.String",
            "key": true,
            "searchable": true
        },
        {
            "name": "firstName",
            "type": "Edm.String",
            "searchable": true,
            "sortable" : true
        },
                {
            "name": "lastName",
            "type": "Edm.String",
            "searchable": true,
            "sortable" : true
        },
        {
            "name": "eyeColor",
            "type": "Edm.String",
            "searchable": true,
            "sortable" : true
        },
        {
            "name": "gender",
            "type": "Edm.String",
            "searchable": false,
            "sortable" : true,
            "filterable": true
        },
        {
            "name": "height",
            "type": "Edm.Int32",
            "searchable": false,
            "sortable" : true,
            "filterable": true
        }
    ]
}

As well as specifying a unique name for the index, star-wars-characters-index, it has 6 fields. Each is marked as searchable and sortable as true. Other additional properties can be added to an index such as filterable, facetable, retrievable and more. All can be found in the Azure Search - Components of an index. However, for now, let's keep it simple.

I've thrown in one variation. I've made height and gender filterable, and the rest sortable.

Note: Only string types are sortable. If you have a datatype which is an integer or date, they can only be filtered on.

Create an Indexer

To create an indexer, we need to specify the name of a data source and the index name that will be populated into.

You have the option of also providing mapping fields from the data source to fields. This is useful if you want to rename certain fields from the data source into the index.

Example POST Indexer request

To create an indexer, we need to call the indexers POST request indexers?api-version=[api-version].

Indexer POST request

POST https://[servicename].search.windows.net/indexers?api-version=[api-version]  
Content-Type: application/json
api-key: [admin key]

Example POST request

POST https://demo-search-3jamlwx.search.windows.net/indexers?api-version=2019-05-06
Content-Type: application/json
api-key: 8585F9E40DBADDDCCB8B03CC74EB2B0C

The json body for the request.

{
    "name": "starwars-indexer",
    "dataSourceName": "starwars-characters-cosmos-datasource",
    "targetIndexName": "star-wars-characters-index",
    "fieldMappings": [
        {
            "sourceFieldName": "id",
            "targetFieldName": "characterId"
        },
        {
            "sourceFieldName": "first_name",
            "targetFieldName": "firstName"
        },
        {
            "sourceFieldName": "last_Name",
            "targetFieldName": "lastName"
        },
        {
            "sourceFieldName": "eye_color",
            "targetFieldName": "eyeColor"
        },
        {
            "sourceFieldName": "gender",
            "targetFieldName": "gender"
        },
        {
            "sourceFieldName": "height",
            "targetFieldName": "height"
        }
    ]
}

If your data source data and the search index fields are identical, there is no need to include fieldMappings. I've added this to demonstrate that you can map fields from the data source such as first_name into the relevant search index field such as firstName.

Perform Basic Searches

Now that all 3 bits are in place, we can perform basic searches, again using the Search API.

Example Search GET request

This is an example of how to perform a basic search.

Search GET request

GET https:/[service name]/indexes/[index name]/docs?api-version=2019-05-06&search=[search term]
Content-Type: application/json
api-key: [admin key]

Example GET request

GET https://demo-search-3jamlwx.search.windows.net/indexes/star-wars-characters-index/docs?api-version=2019-05-06&search=*
Content-Type: application/json
api-key: 8585F9E40DBADDDCCB8B03CC74EB2B0C

In the example above, we are searching for everything by using the asterix * character.

We should get something back like this:

{
    "@odata.context": "https://demo-search-3jamlwx.search.windows.net/indexes('star-wars-characters-index')/$metadata#docs(*)",
    "value": [
        {
            "@search.score": 1.0,
            "characterId": "adb81166-87d7-2eb3-d103-4ca6a84327c4",
            "firstName": "Wilhuff",
            "lastName": "Tarkin",
            "eyeColor": "blue",
            "gender": "male",
            "height": 180
        },
        {
            "@search.score": 1.0,
            "characterId": "b616614c-1adb-0bac-59e8-59af27a7e631",
            "firstName": "Beru",
            "lastName": "Whitesun Lars",
            "eyeColor": "blue",
            "gender": "female",
            "height": 165
        },
        ...

Search by name

Let's use the above example request to perform a search by a particular name. Let's search for 'solo', search=solo.

GET /indexes/star-wars-characters-index/docs?api-version=2019-05-06&search=solo

We should get back something like this:

{
    "@odata.context": "https://demo-search-3jamlwx.search.windows.net/indexes('star-wars-characters-index')/$metadata#docs(*)",
    "value": [
        {
            "@search.score": 0.91308546,
            "characterId": "844cbc68-20d9-cacf-2d7c-762a8a455b1e",
            "firstName": "Han",
            "lastName": "Solo",
            "eyeColor": "brown",
            "gender": "male",
            "height": 180
        }
    ]
}

Now what if I wanted to search for a partial match say sol, search=sol ?

GET indexes/star-wars-characters-index/docs?api-version=2019-05-06&search=sol

What do we get back?

{
    "@odata.context": "https://demo-search-3jamlwx.search.windows.net/indexes('star-wars-characters-index')/$metadata#docs(*)",
    "value": []
}

Yup, nothing. Nada. That's because Azure search doesn't support partial searches straight out of the box. We need to make a minor modification, which I will cover how in my next blog post.

Search and filter

With filtering, we can continue to use a GET request, but we can also use a POST request, which I find neater when it comes to supplying additional parameters such as filtering.

Filter POST request

GET indexes/[index name]/docs/search?api-version=2019-05-06
Content-Type: application/json
api-key: [admin key]

Here is the json body where we can specify filter parameters as well as select parameters.

{
      "search": "",
      "filter": "gender eq 'f' and heigh gt 180",
      "select": "characterId, firstName, lastName, eyeColor, gender, height",
      "count": "true"
}

Example POST request

POST https://demo-search-3jamlwx.search.windows.net/indexes/star-wars-characters-index/docs/search?api-version=2019-05-06
Content-Type: application/json
api-key: 8585F9E40DBADDDCCB8B03CC74EB2B0C

Let's say we want to search for everything, and we want to filter by female genders and a height greater than 165.

{
    "search": "*",
    "filter": "gender eq 'female' and height gt 165"
}

This is what we get for an output:

{
    "@odata.context": "https://demo-search-3jamlwx.search.windows.net/indexes('star-wars-characters-index')/$metadata#docs(*)",
    "value": [
        {
            "@search.score": 1.0,
            "characterId": "943e3eb2-a7f5-826a-5630-f84c5d5e966f",
            "firstName": "Luminara",
            "lastName": "Unduli",
            "eyeColor": "blue",
            "gender": "female",
            "height": 170
        }
    ]
}

The syntax such as gender eq 'female' is an OData syntax. It's pretty much easy to read, and you can find more about the OData syntax in Microsoft's documentation.

Summary

In this post, we covered a basic grasp of Azure Search, the 3 main components its made up of and how you can get a basic search up and running. There are many more advanced features such as partial searching, in addition to pushing data directly without an index, which I hope will cover in my next blog post.

All code samples and data can be found in my [GitHub repository](All code can be found in my Github repository)

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.