Build a Custom Alexa Skill with AWS Lambda

Published in

ScaleCapacity

10 min readAug 13, 2021

Alexa, Amazon’s cloud-based voice service that powers voice experiences on millions of devices including Amazon Echo, Echo Dot, and Fire TV devices. Alexa has a number of built-in skills that enable customers to interact with devices using voice. Skills are like apps for Alexa. Developers can build their own custom skills, by using Alexa Skill Kit (ASK).

This blog briefly describes how to create a custom Alexa Skill along with its backend on AWS Lambda.

Introduction to Alexa Skill

The Alexa Skills Kit (ASK) is a software development framework that enables you to create skills. It is a collection of self-service APIs, tools, documentation, and code samples that make it fast and easy for you to add skills to Alexa. With an interactive voice interface, Alexa gives users a hands-free way to interact with your skill.

An Alexa Skill consists of two main components — a Skill interface and a Skill service.

The skill interface determines the user’s speech requests and then maps them to intents within the interaction model. The intents are actions that fulfill the spoken requests from the user. It handles the translation between audio from the user to events the Skill Service can handle.

The skill service determines what actions need to be taken in response to the JSON encoded event received from the skill interface. Events contain the information about the way in which the user is interacting with the skill including the type of request the user has triggered. After processing, the speech response is sent back to the user.

As developers we need to configure the Skill interface via Amazon’s Alexa Skill developer console and code the Skill service. The interaction between the code on the Skill service and the configuration on the Skill Interface results in a complete Skill.

Voice Interaction Model

First, we need to design the skill’s voice interaction model. Think of the way we want users to interact with our skill. How the user will invoke our skill?

To do so, design the set of words and phrases for your skill that will determine the voice interface through which customers will interact with your skill to deliver a custom voice experience. In simpler words, the Interaction model is what trains the Skill Interface so that it knows how to listen to the user’s spoken words.

When you design a voice interaction model, you define:

Invocation

The act of starting an interaction with an Alexa. Basically, it's the name Alexa uses to identify a skill.

Intent

Intents are templates of user interaction sentences — requests, questions, answers to Alexa’s re-prompts. It represents an action that fulfills a user’s spoken request. Each intent invokes specific skill functionality.

Utterance

Every intent has at least one utterance — a predefined word, phrase, or sentence. These are the words the user says to invoke the intent, to convey what they want to do, or to provide a response to a question.

Slot

An argument to an intent that gives Alexa more information about that request. For example, ‘Alexa, ask Pizza Hut to order Margherita’. In this statement, Margherita is the value of the pizza slot that refines the request.

Create a Custom Alexa Skill

Let’s build our custom Alexa skills.

Prerequisite — Create accounts on Amazon Developer Console and AWS Management Console.

In this blog, I am going to build a custom Amazon Alexa skill Order Pizza that allows user to order their favorite pizzas.

For skill name, choose any name representing your skill. I have used OrderPizza. For the Skill language, in my case English(US), but you can choose any you like.

For the Skill model, select Custom.

For Host, select Provision on your own.

You can start your skill by using skill templates available or import one. I’ll not use one and Start from Scratch.

Once your skill gets provisioned, start with the implementation.

Step 1: Build the Interaction Model

Let’s start by giving an invocation name to our skill, it will be our entry point, I have chosen Order pizza. When the user will say “Alexa, order pizza” our skill will be invoked.

To set the Invocation name, select the Invocation option from the sidebar. Enter your invocation name in the Skill Invocation Name field.

The next step will be the Intents, the intents are actions that fulfill a user’s spoken requests. Here, we will have one custom intent named PlaceOrderIntent which will ask users for their custom choices.

Note: ASK has a large library of built-in intents so we will have to create only our custom intents. Built-in intents have built-in utterances. It is possible to extend the number of existing utterances by adding new utterances. When a skill is created, four built-in intents are added to the Interaction model by default. These are AMAZON.HelpIntent, AMAZON.CancelIntent, AMAZON.FallbackIntent and AMAZON.StopIntent.

To create a custom intent, in the Intents tab, choose to Create new intent. In the Create intent dialog box, type the name of the intent (PlaceOrderIntent), and then choose Add.

For utterances, under the PlaceOrderIntent Intent, add all possible utterance text in the sample utterance field. You can also add utterances with slots i.e “take my order for {}”.

Now let's create some slots for our intent, I have created four slots namely — crust, cheese, size, and quantity. You can add as many as you want. For each slot, we require a slot type. ASK includes a large library of slot types, for quantity slot, we can have the built-in slot type AMAZON.NUMBER. For other slots, we will have to create custom slot types.

To create a custom slot type, move to the Slot Type tab, and click “Add Slot Types”. You can create slot types for all the remaining slots.

The below snapshot shows the slot type Crust for our crust slot in PlaceOrderIntent. I have added two values namely — normal and thin. You can multiple values as per your requirement. You can also add ID or synonyms to a value.

Similarly, create slot types for other slots. And then finally assign these custom slot types to slots created in the PlaceOrderIntent.

Using multiple slots provides the possibility of using dialogs in which the Alexa skill prompts the user to fill all slot values in order to fulfill the intent. Here, I have shown the slot filling for the cheese slot.

Also, you can validate your slots, by adding validation rules through the developer console. You can add confirmation to slots as well as intents. In the below snapshot, I have added a final Intent Confirmation for the order confirmation. Once all the slot values are filled, the intent confirmation will be asked. If yes, it will complete the intent, else will ask for the order again.

Next, you can complete the Voice Interaction Model. Once done, save the interaction model by clicking the “Save Model” button, and build it by clicking the “Build Model” button.

That’s it.

Also, every interaction model has its interaction model schema, which can be viewed on the JSON Editor tab. The interaction model schema, written in a JSON file, contains intents, utterances, slots, and everything that is implemented within the interaction model. Thus, the entire interaction model can be built directly by writing JSON code or by uploading a JSON file.

Step 2: Create AWS Lambda Function

Now, to complete our skill we need to create an AWS lambda function. This will implement event handlers that define how the Skill would behave when the user triggers the event by speaking to an Alexa-enabled device.

There are three main types of requests:

LaunchRequest — sent when the user invokes the skill by saying its invocation name.
IntentRequest — sent when the user interacts with the skill, i.e. when the user’s speech request is mapped to an intent.
SessionEndedRequest — sent when the session ends, it can be due to an error or when the user says “exit” or when the user doesn’t respond while the device is listening, or when the user says something that doesn’t match any defined intent.

Let’s start creating our backend for our custom skill OrderPizza.

First of all, we will create a basic lambda function from the AWS console, navigate to AWS Lambda and create a new function, select the options Author from scratch, give a name to the function like ‘AlexaOrderPizza’, and select Runtime as Nodejs.

In order to connect our skill interface with the AWS Lambda function, we need to pass the skill ID to the Lambda function. The skill ID is located in the Endpoints section on the Alexa Developer Console.

Now we are supposed to set a trigger to our function, it will be Alexa Skill Kit. While configuring the trigger, scroll down to the Configure triggers section and add your Skill ID. Click the Add button to add the new trigger. Now, scroll to the top and save the changes in the function.

Ok, now let's start writing the code, for first-timers, I recommend starting from the hello world example that you will find at the official Alexa GitHub repository here.

The skill builder object helps in adding components responsible for handling input requests and generating custom responses for our skill.

To use handler classes, each request handler is written as a class that implements two methods — can_handle and handle. The can_handle method returns a Boolean value indicating if the request handler can create an appropriate response for the request.

As you view the index.js file, you can see our lambda function has many Intent Handlers, each reacting to different intents triggered by different spoken words. The intent handlers for built-in intents are already created. You need to create intent handlers for the custom intents you have created.

Add the following intent handler for PlaceOrderIntent to your code. Once the dialog is complete, it will retrieve all the slot values and give the final order placed response.

const PlaceOrderIntentHandler = {
 canHandle(handlerInput) {
 return Alexa.getRequestType(handlerInput.requestEnvelope) === ‘IntentRequest’
 && Alexa.getIntentName(handlerInput.requestEnvelope) === ‘PlaceOrderIntent’;
 },
 async handle(handlerInput) {
 const slots = handlerInput.requestEnvelope.request.intent.slots;
 const quantity = slots[‘quantity’].value;
 const cheese = slots[‘cheese’].value;
 const crust = slots[‘crust’].value;
 const size = slots[‘size’].value;
 
 const speakOutput = handlerInput.t(`Great, your order has been placed for ${quantity} ${size} pizza on ${crust} crust with ${cheese} cheese.`);return handlerInput.responseBuilder
 .speak(speakOutput)
 //.reprompt(‘add a reprompt if you want to keep the session open for the user to respond’)
 .getResponse();
 }
};

The exports. Handler acts as the entry point for your skill, routing all request and response payloads to the handlers. So make sure to add all your defined handlers like the PlaceOrderIntentHandler in this as follows:

exports.handler = Alexa.SkillBuilders.custom()
 .addRequestHandlers(
 LaunchRequestHandler,
 HelloWorldIntentHandler,
 PlaceOrderIntentHandler,
 HelpIntentHandler,
 CancelAndStopIntentHandler,
 FallbackIntentHandler,
 SessionEndedRequestHandler,
 IntentReflectorHandler)
 .addErrorHandlers(
 ErrorHandler)
 .withCustomUserAgent(‘sample/hello-world/v1.2’)
 .lambda();

Yeah! We are almost done with our skills. Next, we have to bring up everything together.

Step 3: Set the Endpoints

Once we are finished with our backend, we need to attach Amazon Alexa with AWS lambda, to do this we need the ARN of our lambda function. From the Lambda console, copy the ARN — a unique identifier for the function, located over the Save button.

Now, switch to the Alexa Skill dashboard, in the Endpoint section, select the AWS Lambda ARN option and paste the copied lambda function ARN to the field Default Region. Hit the Save Endpoints button. Yeah, now your back-end can receive endpoint calls from Alexa.

Wahoo! Finally, we have completed our custom Alexa skill and now we can move to the most important part — Testing the Skill.

Testing the Skill

You can test your skill by either using any type of Alexa devices such as the Echo or Echo Dot, or you can use the service simulator provided in the Alexa Developer Console itself.

In the top navigation, click Test. Make sure that the Test is enabled for this skill option is enabled. You can use the Test page to simulate requests, in text and voice form. You can write what you want to say in the empty field or hold the microphone and speak.

Start by saying the invocation name, “open order pizza ” and see how your skill responds.

Also, besides the speech which you will hear and the text which will be displayed on your screen, the service simulator also displays the JSON input ( JSON event sent by the skill interface to the Lambda function) and the JSON output (JSON response sent by the Lambda function to the skill interface).

You now have a working custom Amazon Alexa Skill. Next, you can turn your ideas into working Alexa Skills and publish them to the Amazon Alexa Skills Store.