UberTechBlog: AWS

Showing posts with label AWS. Show all posts

Monday, February 8, 2021

Create Node.js and csvtojson Lambda Layer for AWS Lambda Custom Runtime

This post will cover how to use Node.js and csvtojson in AWS Lambda Custom Runtime which can be useful for use cases wherein CSV to JSON conversion is required as a part of a Lambda function in a serverless architecture.

Have a quick look at Using JQ in AWS Lambda Custom Runtime via AWS Lambda Layer for a quick reference to understanding how AWS Lambda Custom Runtime is bootstrapped and how AWS Layers works.

Creating Node.js Lambda Layer

Setup for Node.js for AWS Lambda Custom Runtime is actually quite easy, all that is needed is aws-lambda-custom-node-runtime (v1.0.2). Install NPM if not already done, and run the following command:

aws-lambda-custom-node-runtime 11.3.0

The above command will generate a directory named node-v11.3.0-linux-x64 (at the path where the command is run) with all the necessary files required to run Node.js.

Now zip the entire node-v11.3.0-linux-x64 directory using the following command

zip -r node-v11.3.0-linux-x64.zip node-v11.3.0-linux-x64

The zip archive, i.e node-v11.3.0-linux-x64.zip, is our Node.js Lambda layer.

Creating CSVTOJSON Lambda Layer

To create a CSVTOJSON bundle, run the following command

npm install csvtojson --save

This will create a node_modules directory at the path at which the command is executed. Now, similar to what was done for building the Node.js layer, run the following command to build the CSVTOJSON Lambda Layer

zip -r node_dependencies.zip node_modules

The node_dependencies.zip is now our CSVTOJSON Lambda layer.

Separating Lambda layers as Node.js layer and node dependencies layer helps because doing so allows the layers to be reused across multiple Lambda functions. Doing this, the Node.js Lambda layer can be reused across multiple Lambda functions and each Lambda function can have its own set of node dependencies. But it all boils down to how AWS Lambdas are used in your serverless architecture and there might be differences in best practices.

Using Node.js and CSVTOJSON Lambda Layers

Now, the built Lambda Layers can be uploaded to the AWS Lambda Layers section, and a test-run in the following way can be done to verify whether they are working fine:

(Visit Using JQ in AWS Lambda Custom Runtime via AWS Lambda Layer for more details on what the function handler)

function handler () {
    EVENT_DATA=$1
    
    cd /opt
    ./node-v11.3.0-linux-x64/bin/node --version
    ./node-v11.3.0-linux-x64/bin/node node_modules/csvtojson/bin/csvtojson version
}

When the above handler is triggered, the Node and CSVTOJSON versions can be expected in the success output.

Sunday, January 31, 2021

Using JQ in AWS Lambda Custom Runtime via AWS Lambda Layer

For a situation wherein there's a need to use JQ on AWS Lambda custom runtime, an AWS Lambda layer can be created and used for your AWS Lambda function. This blog post explains how the aforementioned can be achieved but is not just limited to creating a layer for JQ, the instructions can be similarly used for building AWS Lambda layer for any Linux distribution.

For quick reference

AWS Lambda: is a serverless compute service that allows running code without provisioning or managing servers. It's a powerful tool backing for potentially implementing the serverless architecture.

JQ: it's is like sed for JSON data and is a fast and flexible CLI JSON processor tool written in portable C.

Steps to follow

The entire process of building an AWS Lambda layer can be broken down as follows:

Get the required distribution files
Build a zip archive of the required files
Create a layer on AWS Lambda
Use created AWS Lambda layer for your Lambda function

Get required distribution files

Since AWS Lambda custom runtime is based on Amazon Linux AMI, let's first get the JQ files specific to Amazon Linux AMI. For this, create a new Amazon Linux EC2 instance (or use an Amazon Linux Docker image).

Install JQ and locate the required files (installed on Amazon Linux EC2 instance) once the installation is completed.

Installing JQ

sudo yum install jq

At the time of writing this post, for JQ version 1.5, the required files can be found at the following location:

# executable
/usr/bin/jq

# dependencies
/usr/lib64/libjq.so
/usr/lib64/libjq.so.1.0.4
/usr/lib64/libonig.so.2

Build a zip archive with required files

Build a jq.zip archive containing all these required files so that JQ is functional when used inside the AWS Lambda function. The jq executable file should be at the root of the zip file and dependencies should be inside the lib directory in the zip file. The reason being, when AWS Lambda layers are unpacked by AWS, the custom runtime dependency path of the Lambda function should be /opt and /opt/lib. To simplify it, /opt is where the executables should go, and /opt/lib is where the required dependencies should go.

Create a layer of AWS Lambda

Now let's create a JQ layer on AWS Lambda so that it can be used by the AWS LAmbda function. It can be directly created via AWS Console or use the following command to create it using the AWS CLI:

aws lambda publish-layer-version --layer-name jq --zip-file /PATH_TO_FILE/jq.zip

Use created AWS Lambda layer for your Lambda function

Once the jq layer is ready to use from the above step, create an AWS Lambda function with custom runtime using AWS console. After creating a Lambda function with custom runtime, AWS automatically creates a bootstrap and hello.sh files whose content (at the time of writing this post) is as follows:

bootstrap:

#!/bin/sh
set -euo pipefail

echo "##  Environment variables:"
env

# Handler format: <script_name>.<bash_function_name>
#
# The script file <script_name>.sh  must be located at the root of your
# function's deployment package, alongside this bootstrap executable.
source $(dirname "$0")/"$(echo $_HANDLER | cut -d. -f1).sh"

while true
do
    # Request the next event from the Lambda runtime
    HEADERS="$(mktemp)"
    EVENT_DATA=$(curl -v -sS -LD "$HEADERS" -X GET "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")
    INVOCATION_ID=$(grep -Fi Lambda-Runtime-Aws-Request-Id "$HEADERS" | tr -d '[:space:]' | cut -d: -f2)

    # Execute the handler function from the script
    RESPONSE=$($(echo "$_HANDLER" | cut -d. -f2) "$EVENT_DATA")

    # Send the response to Lambda runtime
    curl -v -sS -X POST "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$INVOCATION_ID/response" -d "$RESPONSE"
done

hello.sh

function handler () {
    EVENT_DATA=$1

    RESPONSE="{\"statusCode\": 200, \"body\": \"Hello from Lambda!\"}"
    echo $RESPONSE
}

Update the hello.sh function so at it prints the jq version to test if the jq layer is working properly.

function handler () {
    EVENT_DATA=$1
    
    cd /opt
    ./jq --version   
}

Now run a quick test on the hello.sh Lambda function and it should print the jq version of the AWS Lambda layer as follows:

jq-1.5

Follow a similar process to create an AWS Lambda layer for any distribution.

Friday, May 4, 2018

Stream a file to AWS S3 using Akka Streams (via Alpakka) in Play Framework

In this blog post we’ll see how a file can be streamed from a client (eg: browser) to Amazon S3 (AWS S3) using Alpakka’s AWS S3 connector. Aplakka provides various Akka Stream connectors, integration patterns and data transformations for integration use cases.

The example in this blog post uses Play Framework to provide a user interface to submit a file from a web page directly to AWS S3 without creating any temporary files (on the storage space) during the process. The file will be streamed to AWS S3 using S3’s multipart upload API.

(To understand this blog post basic knowledge of Play Framework and Akka Streams is required. Also, check out What can Reactive Streams offer EE4J by James Roper and also check its Servlet IO section to fully understand the extent to which the example mentioned in this blog post can be helpful)

Let’s begin by looking at the artifacts used for achieving the task at hand

Scala 2.11.11
Play Framework 2.6.10
Alpakka S3 0.18

Now moving on to the fun part, let’s see what the code base will look like. We’ll first create a class for interacting with AWS S3 using the Alpakka S3 connector, let’s name the class as AwsS3Client.

@Singleton
class AwsS3Client @Inject()(system: ActorSystem, materializer: Materializer) {

  private val awsCredentials = new BasicAWSCredentials("AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY")
  private val awsCredentialsProvider = new AWSStaticCredentialsProvider(awsCredentials)
  private val regionProvider =
    new AwsRegionProvider {
      def getRegion: String = "us-west-2"
    }
  private val settings = new S3Settings(MemoryBufferType, None, awsCredentialsProvider, regionProvider, false, None, ListBucketVersion2)
  private val s3Client = new S3Client(settings)(system, materializer)

  def s3Sink(bucketName: String, bucketKey: String): Sink[ByteString, Future[MultipartUploadResult]] =
    s3Client.multipartUpload(bucketName, bucketKey)
}

From the first line it can be seen the class is marked as a Singleton, this is because we do not want multiple instances of this class to be created. From the next line it can be seen that ActorSystem and Materializer is injected which is required for configuring the Alpakka’s AWS S3 client. The next few lines are for configuring an instance of Alpakka’s AWS S3 client which will be used for interfacing with your AWS S3 bucket. Also, in the last section of the class there’s a behavior which returns a Akka Streams Sink, of type Sink[ByteSring, Future[MultipartUploadResult]], this Sink does the job of sending the file stream to AWS S3 bucket using AWS multipart upload API.

In order to make this class workable replace AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY with your AWS S3 access key and secret key respectively. And replace us-west-2 with your respective AWS region.

Next, let’s look at how the s3Sink behavior of this call can be used to connect our Play Framework’s controller with AWS S3 multipart upload API. But before doing that and slightly digressing from the example [bear with me, it’s going to build up the example further :)], if you followed my previous blog post — Streaming data from PostgreSQL using Akka Streams and Slick in Play Framework [containing Customer Management example] — you might have seen how a CustomerController was used to build a functionality wherein a Play Framework’s route was available to stream the customer data directly from PostgreSQL into a downloadable CSV file (without the need to buffering data as file on storage space). This blog post builds an example on top of the Customer Management example highlighted in the previous blog post. So, we’re going to use the same CustomerController but modify it a bit in terms of adding a new Play Framework’s Action for accepting the file from the web page.

For simplicity, let’s name the controller Action as upload, this Action is used for accepting a file from a web page via one of the reverse route. Let’s first look at the controller code base and then we’ll discuss about the reverse route.

@Singleton
class CustomerController @Inject()(cc: ControllerComponents, awsS3Client: AwsS3Client)
                                  (implicit ec: ExecutionContext) extends AbstractController(cc) {

  def upload: Action[MultipartFormData[MultipartUploadResult]] =
    Action(parse.multipartFormData(handleFilePartAwsUploadResult)) { request =>
      val maybeUploadResult =
        request.body.file("customers").map {
          case FilePart(key, filename, contentType, multipartUploadResult) =>
            multipartUploadResult
          }
 
      maybeUploadResult.fold(
        InternalServerError("Something went wrong!")
      )(uploadResult =>
        Ok(s"File ${uploadResult.key} upload to bucket ${uploadResult.bucket}")
      )
    }

   private def handleFilePartAwsUploadResult: Multipart.FilePartHandler[MultipartUploadResult] = {
     case FileInfo(partName, filename, contentType) =>
       val accumulator = Accumulator(awsS3Client.s3Sink("test-ocr", filename))

       accumulator map { multipartUploadResult =>
         FilePart(partName, filename, contentType, multipartUploadResult)
       }
   }
}

Dissecting the controller code base, it can be seen that the controller is a singleton and the AwsS3Client class that was created earlier is injected in the controller along with the Play ControllerComponents and ExecutionContext.

Let’s look at the private behavior of the CustomerController first, i.e handleFilePartAwsUploadResult. It can be seen that the return type of this behavior is

Multipart.FilePartHandler[MultipartUploadResult]

which is nothing but a Scala type defined inside Play’s Multipart object:

type FilePartHandler[A] = FileInfo => Accumulator[ByteString, FilePart[A]]

It should be noted here that the example uses multipart/form-data encoding for file upload, so the default multipartFormData parser is used by providing a FilePartHandler of type FilePartHandler[MultipartUploadResult]. The type of FilePartHandler is MultipartUploadResult because Alpakka AWS S3 Sink is of type Sink[ByteString, Future[MultipartUploadResult]] to which the file will be finally sent to.

Looking at this private behavior and understanding what it does, it accepts a case class of type FileInfo, creates an Accumulator from s3Sink and then finally maps the result of the Accumulator to a result of type FilePart.

NOTE: Accumulator is essentially a lightweight wrapper around Akka Sink that gets materialized to a Future. It provides convenient methods for working directly with Future as well as transforming the inputs.

Moving ahead and understanding the upload Action, it looks like any other normal Play Framework Action with the only difference that the request body is being parsed to MultipartFormData and then handled via our custom FilePartHandler, i.e handleFilePartAwsUploadResult, which was discussed earlier.

For connecting everything together, we need to enable an endpoint to facilitate this file upload and a view to be able to submit a file. Let’s add a new reverse route to the Play’s route file:

POST /upload controllers.CustomerController.upload

and a view to enable file upload from the user interface

@import helper._

@()(implicit request: RequestHeader)

@main("Customer Management Portal") {
  <h1><b>Upload Customers to AWS S3</b></h1>
  @helper.form(CSRF(routes.CustomerController.upload()), 'enctype -> "multipart/form-data") {
    <input type="file" name="customers">
    <br>
    <input type="submit">
  }
}

Note the CSRF which is required for the form as it is enabled by default in Play Framework.

The entire code base is available at the following repository playakkastreams.

Hope this helps, shout out your queries in the comment section :)

This article was first published on the Knoldus blog.