Alexa (named after the ancient library of Alexandria) is
Amazon's cloud based voice-control system available on
millions of devices from Amazon and any third-party device
manufacturers. With Alexa, you can add your own customized
skills to any of Amzon Echo devices and from determining
weather to storing contacts, planning trips and setting
reminders to playing music and even quiz games, Alexa can do
anything. The Alexa Voice Service Platform handles all the
Text to speech conversions which makes all the interactions
possible.
Why Alexa?
One of the foremost reason of using Alexa is "Responsiveness".
To use Alexa, you need not use any button for activation. You
just say "Alexa" or "Echo" or "Computer" or "Amazon", which
are the triggers for Alexa, followed by the activity you want
to perform and your work would be done. Only you have to be
careful about the alexa set-up and using the correct commands.
Alexa Echo speaker is currently in its 2nd generation and has
every possible feature available from smart home systems to
digital-assistant abilities.
What devices Alexa works on?
Amazon Alexa perhaps delivers the best experience on the
Amazon Echo. However, Amazon also has brought the smart
assistant to other home hubs, such as the Echo Dot and the
Tap. Alexa supports Amazon's Fire TV set-top box and Fire HD 8
tablet too.
Amazon also has allowed some third parties to support Alexa.
For example, the LG SmartThinQ hub and the Pebble Core
wearable come with Alexa support.
Architectural Workflow
Consider a scenario wherein Alexa determines The NSE Stock
Prices.
This is how the flow communication will look like:
-
The user commands an Echo device, using any of the available
trigger words so that Echo knows that it is being addressed,
and identifies the Skill that the user wishes to interact
with. For example, for my skill called StockPrice, I ask
"Alexa, give the stock price for TCS". In this case, "Alexa"
is the trigger word to make the Echo listen, and
"StockPrice" identifies the skill that the user wants to
direct their enquiry to.
-
Echo sends this request to the Amazon Voice Service
Platform, which handles speech recognition, turning the
user's speech into tokens identifying the "skill". Then, it
breaks the skill down into a structured representation and
sends it to the Custom Alexa skill. In our example, the
"skill" would be that the user wants to know "stockprice",
and the context for that would be that they are interested
specifically in stock price of a specified company.
-
Intents, and possible parameter values for the skills are
held by the Alexa Service Platform as configuration items
for the Skill.The intent and its slots,slot types and slot
values for the user's request are then sent in JSON format
document to the server side Skill implementation for
processing. The Alexa Service Platform knows where to send
these requests as it maintains a set of Lambda ARNs for each
Custom Skill.
-
The Custom Skill receives the JSON via a HTTPs request or is
implemented as an AWS Lambda function, via invocation of the
Lambda function at the configured ARN. The AWS Lambda and
the custom skill are integrated using the "Alexa Skill Kit"
trigger which is added by enabling a skill using a "Skill
ID".
-
The Custom Skill code parses the JSON, reading the intent
and its contents, and then performs suitable processing to
retrieve data appropriate to those, for example, API calls
or retrieving data from database. In our example, the code
would need to call the Alpha Vantage API to get stock prices
of a company.
-
A response in JSON format is then sent back to the Alexa
Voice Service Platform containing both the text that Alexa
should speak to the user and if required also the image
diplay of the response if we are using a device like Echo
Show.
-
The Alexa Service Platform receives the response, and uses
text to speech conversion logic to speak the response to the
user.
Building a Basic Skill
-
To build a basic skill in Alexa there are 2 main
pre-requisites:
-
To have an Amazon Developer Account so as to build your
customised skill.
-
To also have Amazon Management Console Account so as to
create your Lambda.
-
To build a customised skill you should know the following
terms:
-
Skill: A Skill is nothing but your application which you
intent to publish on your Alexa device.
-
Invocation: The name of the skill which you need to
mention so as to start interacting with your skill.
-
Intents, slots and utterances: Intent is the action that
fulfils user's request. Intents can have slots that
represent the variable information within an intent. A
sample utterance the way you invoke your intent.
-
Slot types: Every slot has a type that handles the
user's spoken data. For eg. AMAZON.NUMBER converts the
number "five" to "5".
-
JSON Editor: Your whole interaction model will be
represented in JSON format. You can create as well as
edit your JSON data. You can also upload a JSON file of
your own.
-
Interfaces: Interfaces provide additional directives and
request types for specific additional features in your
skill. For ex., You can use an Audio Controller for
streaming music.
-
Endpoints: Specify the endpoint for your skill. Alexa
sends requests to this endpoint when users invoke your
skill. If you are hosting your service as an AWS Lambda
function, select the AWS Lambda ARN option and enter the
ARN for your function in the Default Region endpoint
text box.
Alexa Skill Types
-
Custom Skill: A skill that can handle just about any type of
request. (It is selected by default) For example: Look up
information from a web service, Integrate with a web service
to order something (order a car from Uber, order a pizza
from Domino's Pizza), Interactive games, Just about anything
else you can think of.
-
Flash Briefing Skill: The Flash Briefing Skill API defines
the words users say to invoke the flash briefing or news
request (utterances) and the format of the content so that
Alexa can provide it to the customer.
-
Smart Home Skill: A skill that lets a user control and query
cloud- enabled smart home devices such as lights, door
locks, cameras, thermostats and smart TVs. For example: Turn
off the lights, Change the brightness of dim lights, change
the volume, etc.
-
Video Skill: The Video Skill API defines the requests the
skill can handle (device directives) and the words users say
to invoke those requests (utterances). For example: Play a
movie, change the channel, etc.
-
You can build your Lambda function in any of these
languages: Python, Java, Go and .NET
Conclusion
"What's good for developers is ultimately good for consumers,"
said Rob Pulciani, Amazon's general manager of Alexa skills.
Alexa, thus has proved to be a game changer in IT in recent
times. Right now, Alexa has a leg up on Google Assistant and
Apple's Siri. Thus, Alexa has maintained itself as a healthy
competitor in the age of machines today.