← Back to all posts

Putting Amazon Lambda to work with Kinesis

13 February 2015

Written by Richard Gubby

At AWS re:Invent 2014, Werner Vogels (Amazon CTO) announced Amazon Lambda. He described it as “an event-driven computing service for dynamic applications” and it’s going to change the way you think about computing resources forever!

Instead of having dedicated resources on all the time to process code (via a cron / job queue / whatever), you can now have Lambda functions execute if and only if events happen. Triggering these events is as simple as uploading a file to an S3 bucket, or adding records to a DynamoDB or Kinesis stream.


Kinesis is a service that enables you to process streaming data in real-time at massive scale. It’s great for metrics and analytics analysis but also allows you to put together complex stream processes - which is exactly what we’re going to do here.

Having something online for only when you need it is a very attractive prospect - but a magic bullet it is not. It comes with it’s own set of restrictions and getting it to work properly in the first place (with Kinesis especially) isn’t a walk in the park.

We’ve had a play around with Lambda and Kinesis recently, so here’s why it may or may not be a good fit for your next project. The sample app I’ll be describing/implementing is a simple function that subscribes to a Kinesis stream, decodes the payload and logs the output out to CloudWatch.

Prerequisites

  • I’m going to do (almost) everything from the command line, so you’ll need to install the AWS CLI. If the thought of running a terminal window brings you out in a cold sweat, it’s time to brush up on your bash-fu. This tutorial is not about installing the AWS CLI, so you’re best off reading it here.

  • AWS admin user. Make sure you have one. If not, create one.

  • You’ll need a couple of AWS roles created, one for code execution and one for code invocation. As well as not being about installing the AWS CLI, this tutorial is also not about creating AWS roles. Create an IAM role for AWS Lambda (execution role) and create an IAM role for AWS Lambda (invocation role).

Lambda

The easiest way to create a sample app is to create a test function in the AWS console, copy and paste the Lambda boilerplate code then amend it as necessary. That way you can grab the important bits where it processes the record from Kinesis into the right variables. Here’s a copy I made earlier:

console.log('Loading event');
exports.handler = function(event, context) {
   console.log(JSON.stringify(event, null, '  '));
   for(i = 0; i < event.Records.length; ++i) {
     encodedPayload = event.Records[i].kinesis.data;
     // Kinesis data is base64 encoded so decode here
     payload = new Buffer(encodedPayload, 'base64').toString('ascii');
     console.log('Decoded payload: ' + payload);
   }
   context.done(null, 'Hello World');
};

What that small example is doing, is looping over records from Kinesis, decoding them ready for use. Save that in a file locally (ProcessKinesisRecord.js will do for now), then zip that single file up into ProcessKinesisRecord.zip.

Lets upload that zip file to Lambda first, so that we can then test it - here’s the command to upload it (using the execution role setup earlier):

$ aws lambda upload-function \
--region eu-west-1 \
--function-name ProcessKinesisRecord \
--function-zip dist/lambda/ProcessKinesisRecord.zip \
--role arn:aws:iam::123456789012:role/executionrole  \
--mode event \
--handler ProcessKinesisRecord.handler \
--runtime nodejs \
--timeout 10 \
--profile admin

That command uploads the zip file to Lambda, naming the function ProcessKinesisRecord, with the right role, specifying the runtime as nodejs, a timeout of 10 seconds and that the handler for this function (exports.handler) can be found inside of ProcessKinesisRecord. It just so happens that our function name, zip file name and file name are all the same, but they could all be different if you want.

If you head over to the Lambda console, you’ll now see your newly created function - you can even try sending test events to it to see if it does what it is supposed to do. (Don’t forget to change the Sample event to Kinesis).

Kinesis

Now that we’ve got a Lambda function created, it’s time to get started with Kinesis streams.

Create a stream:

$ aws kinesis create-stream \
--stream-name data-stream \
--shard-count 1 \
--region eu-west-1 \
--profile admin

What we’re doing here is creating a stream, giving it a name, a shard count, the region it’s located in, using a user profile that has the ability to create streams. A shard in Kinesis gives you capacity for transactions - the more shards you have, the more you can cope with. The actual metrics of a shard are a bit out of scope of this article - but here’s a page that describes the key concepts.

Once it’s created, you need to wait for a bit and then check the stream is active. You can do that by using this command:

$ aws kinesis describe-stream \
--stream-name data-stream \
--region eu-west-1 \
--profile admin

What you’re looking for in the output is that the StreamStatus field is set to ACTIVE. Also - be sure to make a note of the StreamARN - you’ll need that later.

With that, you’re done with Kinesis - you can fire some events off at it now if you like - but we’ll do more with that later once we’ve hooked it up to our Lambda function.

Adding Kinesis event source

So now we have a Lambda function and a Kinesis stream, but at the moment they are very much in isolation - we want them to play nicely together. Up to now, even though I’ve given instructions using the command line, you could, if you wanted to, do it manually with the Amazon AWS Dashboard. The bad news if you’ve been doing this, is that the following is command line only.

$ aws lambda add-event-source \
--region eu-west-1 \
--function-name ProcessKinesisRecord \
--role arn:aws:iam::123456789012:role/invocationrole \
--event-source arn:aws:kinesis:eu-west-1:123456789012:stream/data-stream \
--batch-size 100 \
--profile admin

This command is telling Lambda to add an event source to the ProcessKinesisRecord function, with the right role (you set this up earlier, right?), with an event source that matches the StreamARN from Kinesis. If you can’t remember what this is, or forgot to make a note - run the aws kinesis describe-stream... command again.

It usually takes a few minutes for this to propagate through the AWS system - you’ll want to check it’s ready by using the following command:

$ aws lambda list-event-sources --function-name ProcessKinesisRecord

You’re looking out to see if the Status field is OK in the output.

One thing to note here - sometimes this step just does not work and you get a status of Pending or PROBLEM: internal Lambda error. Please contact Lambda customer support.. If you do get this, there is a quick workaround:

  • Add a 2nd Kinesis stream
  • Add that as an event source to your Lambda function
  • Delete the 1st Kinesis event source
  • Delete the 1st Kinesis stream Even then, you’ll still get those event sources listed (it is still in Beta!)- but at least you’ll be able to get events through. You should also probably drop something on the AWS support forum too to let them know there is a problem.

Testing

So now we have a Lambda function, a Kinesis stream and they’re on speaking terms. Lets put that to the test.

$ aws kinesis put-record \
--stream-name data-stream \
--data "SGVsbG8sIHdvcmxkIQ==" \
--partition-key shardId-000000000000 \
--region eu-west-1 \
--profile admin

This command sends data to your Kinesis stream. To check it’s gone through end-to-end, head over to the Amazon Lambda Dashboard, find your function and expand it. If you have just that one function it’ll be expanded automatically - then look for the Cloudwatch Metric called “Request count” and click on the “logs” link in the top right of that box (see the screenshot).

lambda metrics

This link opens up CloudWatch, in a Log Group setup automatically for this Lambda function. Find the Log Stream that matches up to the date/time you sent a record into Kinesis (it should be the latest one if you’ve just done it), click on the name and you’ll be presented with the record of the Lambda function decoding the data and outputting it to the log.

Things to Note / Gotchas

Node only

As of Feb 2015, Node is the only supported runtime. No doubt there will be more at some point in the future, but not right now - if you want to use it and don’t know how, it’s time to break out your Node.js books. One thing to remember though is that even though Lambda is Node only, Kinesis isn’t - you can write code to push into Kinesis in any of your favourite languages.

Hard to debug

A simple script such as the one above isn’t that difficult to debug. You can even use the online editor to make changes live and see how that impacts. That’s great for tiny apps, but once you start adding extra libs / dependencies into your zip file, you’ll have to re-upload every time you make a change. When you have a stable working Lambda, that’s definitely how you’d want to do it (bonus points for an auto-deploy from your CI environment), but it does make it difficult to get started.

Problems with adding event sources

Sometimes the add-event-source from Kinesis works, sometimes it doesn’t. If you get problems, just add a new Kinesis stream and re-add it to your Lambda function.

Speed

Whilst only running this function as and when an event is created is great, bear in mind that it has to initialise the entire app every time. So when you’re doing more complicated tasks and connecting to multiple database types, it’s going to take time to connect. Your always-on hardware is just that - always on and always connected - it will run code faster.

60 seconds to save the world!

Lambda functions must complete in 60 seconds. You can request that they complete sooner than that, but you can’t go above it. If you have a task that takes an hour to complete, Lambda isn’t going to work for you (unless you can split it up into 60, 1 minute tasks?).

Scheduling

Even when you have everything running perfectly, you still need to push an event into Kinesis for it to do anything, right? A simple solution for pushing a regular event could be to use something like the Heroku Scheduler. At regular intervals, you could ask the scheduler to execute a local script which only does a put-record into Kinesis.

Price

Lambda functions are charged per request - if you’re on the free tier you get 1M free requests, but even if you’re not, it’s very cheap to fire lots of requests at it. Here’s a full breakdown of Lambda pricing.

I hope that helps you get started with Lambda and Kinesis. If you have any other useful tips, suggestions, or have already used Lambda in anger, let us know!

Comments on HN