Get storage for your Serverless Functions with Minio & Docker

In this post I'll show you how to make use of storage for your Serverless functions with OpenFaaS using Minio - a lightweight S3 alternative. That means you can run your functions with storage on any cloud on your own terms.

Introduction

Let's talk a little about storage for Docker. Typically you will find two types of storage available:

  • Block storage

Block storage is effectively like a disk mounted on your PC or within a container. You can read/write exactly as if it were an SDD or HDD plugged in to the physical host.

Pros:

  • No changes needed to code, just pick the "path" or mount.
  • Accessible over the network
  • Full directory structure available
  • Fast

Cons:

  • Hard to access securely over the internet
  • Hard to manage
  • Requires drivers / plugins for Docker Swarm / Kubernetes

Examples: Ceph, Infinit

  • Object storage

Object storage as popularised by AWS S3 provides an API where a binary object or file can be stored or retrieved.

Pros:

  • Self-contained and easy to setup
  • Easier to secure over a network or the Internet
  • Client API libraries available for common languages
  • Commonly used with cloud/web-applications when targeting AWS S3 buckets
  • The backing server can be switched easily between AWS, Ceph, Minio and other providers whether self-hosted or in the cloud

Cons:

  • Only a basic file structure is available - i.e. you cannot have files in the root of the server
  • When running on your own infrastructure several hosts are required for high availability/replication
  • Additional dependency on a client library or command-line utility

Examples: minio.io, Ceph

Block storage is usually attached through mounting a volume from the host directly to the running container. This often causes unexpected behaviour with file permissions. When you start to do this across a cluster mounting a volume from a ephemeral host doesn't make much sense. Network-distributed block storage like Ceph aims to solve this problem. A good example of where you would want this kind of block storage is with an existing application such as a Postgres DB or legacy system.

Given that Serverless functions are:

  • short-lived (seconds)
  • stateless (no in-memory state/direct storage)
  • single-purpose

It should be considered an anti-pattern to attach permanent storage volumes to a function. That means we should evaluate the other options such as using APIs and Object storage.

In this tutorial we will setup Minio and write two Serverless Functions for OpenFaaS that work together to process images in a pipeline. You can then take that template and make it do whatever you need.

Pre-requisites

These are the pre-requisites for the tutorial, you'll need to configure OpenFaaS before you start.

We will build up the code as we walk through the tutorial but you can grab the full example here from GitHub.

Setup Minio

You can setup Minio in a few minutes by following the official guides for Docker Swarm or Kubernetes.



https://minio.io

Important you must take note of your AccessKey and SecretKey which are used to access storage.

When working with Minio it's useful to install the mc (minio-client) CLI. You can use it to copy files in/out of buckets and access remote servers.

Please download the Minio Client (mc) here before continuing.

When you run the Minio Server for the first time it will print out a command for configuring your mc client - i.e.

mc config host add myminio http://192.168.0.35:9000 67L5FZ4DGF58HG7SR5FY W9HYUC03qIRJb06qENoZZ6ZXUGP9Z3DrGkK49dRY

Note: If you see a private IP from your Docker container you may need to change it for the service IP such as the public or private network IP i.e. http://192.168.0.35:9000

Make the two initial buckets. You could also do this through code:

$ mc mb docker/incoming
$ mc mb docker/processed

Write the functions

Let's write two short Python functions:

LoadImages

  • Passed a JSON map of URLs
  • Download the images and puts them into an incoming bucket
  • Calls ProcessImages with the filenames and paths of the images

ProcessImages

  • Downloads the images from the bucket
  • Converts to black and white
  • Uploads to a new processed bucket

Scaffold the functions

You can scaffold the functions with faas-cli new:

faas-cli new --lang python loadimages
faas-cli new --lang python processimages

Each Python function you create gets its own folder for the handler.py and requirements.txt.

Update the requirements.txt as used by pip during the building of the functions:

loadimages/requirements.txt

requests
minio

processimages/requirements.txt

requests
minio

This installs a HTTP client and the Minio client into the function.

LoadImages function

Let's work on the loadimages function. The input will be a JSON payload with the URLs we need to download and process in the function.

  • Input:
{
 "urls": [
   "https://images.pexels.com/photos/72161/pexels-photo-72161.jpeg?dl&fit=crop&w=640&h=318",
   "https://images.pexels.com/photos/382167/pexels-photo-382167.jpeg?dl&fit=crop&w=640&h=337"]
}
  • Output:

The output is an array of filenames in the incoming bucket:

["filename1", "filename2"]

These URLS point to two colour images from Pexels.com - free stock art images.

Let's now look at the handler which:

  • Downloads each file
  • Creates a UUID as a filename
  • Copies the file to the Minio bucket
  • Finally returns a list of filenames for the next step

loadimages/handler.py.

from minio import Minio
import requests
import json
import uuid
import os

def handle(st):
    req = json.loads(st)

    mc = Minio(os.environ['minio_hostname'],
                  access_key=os.environ['minio_access_key'],
                  secret_key=os.environ['minio_secret_key'],
                  secure=False)

    names = []
    for url in req["urls"]:
        names.append(download_push(url, mc))
    print(json.dumps(names))

def download_push(url, mc):
    # download file
    r = requests.get(url)

    # write to temporary file
    file_name = get_temp_file()
    f = open("/tmp/" + file_name, "wb")
    f.write(r.content)
    f.close()

    # sync to Minio
    mc.fput_object("incoming", file_name, "/tmp/"+file_name)
    return file_name

def get_temp_file():
    uuid_value = str(uuid.uuid4())
    return uuid_value

Now update the YML file for your function with the environmental variables needed for accessing Minio. These can also be put through secure Kubernetes or Docker secrets at a later date.

loadimages.yml

provider:
  name: faas
  gateway: http://localhost:8080

functions:
  loadimages:
    lang: python
    handler: ./loadimages
    image: loadimages
    environment:
      minio_hostname: "192.168.0.35:9000"
      minio_access_key: "67L5FZ4DGF58HG7SR5FY"
      minio_secret_key: "W9HYUC03qIRJb06qENoZZ6ZXUGP9Z3DrGkK49dRY"
      write_debug: true

Now we need to build, deploy and invoke the function. If you're using a single-node cluster or your laptop you shouldn't need to push the images to the Docker Hub.

$ faas-cli build -f loadimages.yml

# Remote clusters or multi-node
$ faas-cli push -f loadimages.yml

$ faas-cli deploy -f loadimages.yml

Invoke the function:

$ echo '{
 "urls": [
   "https://images.pexels.com/photos/72161/pexels-photo-72161.jpeg?dl&fit=crop&w=640&h=318",
   "https://images.pexels.com/photos/382167/pexels-photo-382167.jpeg?dl&fit=crop&w=640&h=337"]
}' | faas invoke loadimages

["761ac85c-ec76-4114-9790-ce9fa2a772f9", "64d080ec-5685-44cc-9c70-5d5886823707"]

You'll see the two image filenames which were uploaded are returned to you on the console in a JSON array. These files will now be available in the "incoming" bucket in Minio.

Here's one of the files I copied back to my computer from Minio:

You can open the UI at http://localhost:9000 or whatever IP address you used.

Above: Minio UI

ProcessImages

Let's work on the processimages function.

  • Fetch image from Minio in the incoming bucket
  • Convert to black and white using a third-party ImageMagick function
  • Push back to the processed bucket

Now we could add a dependency to ImageMagick to do the B&W conversion, but OpenFaaS already has a function that we can deploy for this and then call it from our function.

  • Input:

The input is an array of filenames to be fetched from the incoming bucket:

["filename1", "filename2"]
  • Output:

The output is an array of filenames in the processed bucket:

["filename3", "filename4"]

Edit processimages.yml

  • Add Minio config
  • Add a section for the "convertbw" function

processimages.yml

provider:
  name: faas
  gateway: http://localhost:8080

functions:
  processimages:
    lang: python
    handler: ./processimages
    image: processimages
    environment:
      minio_hostname: "192.168.0.35:9000"
      minio_access_key: "67L5FZ4DGF58HG7SR5FY"
      minio_secret_key: "W9HYUC03qIRJb06qENoZZ6ZXUGP9Z3DrGkK49dRY"
      write_debug: true

  convertbw:
    skip_build: true
    image: functions/resizer:latest
    fprocess: "convert - -colorspace Gray fd:1"

You'll see that the convertbw function is using an image from the Docker hub so we can add a skip_build entry. It also doesn't need access to Minio so we don't provide the environmental variables for it.

processimages/requirements.txt

minio
requests

Our handler for processimages is very similar to the one we made for loadimages. It does the following:

  • Parses the input as JSON
  • Downloads each file from a Minio bucket - one-by-one
  • Sends each file as a binary HTTP POST to the convertbw function and saves the result to disk
  • The result is then pushed into the Minio bucket called processed
  • The resulting filenames are printed back to the caller

processimages/handler.py

from minio import Minio
import requests
import json
import uuid
import os

def handle(st):
    req = json.loads(st)

    mc = Minio(os.environ['minio_hostname'],
                  access_key=os.environ['minio_access_key'],
                  secret_key=os.environ['minio_secret_key'],
                  secure=False)

    names = []
    source_bucket = "incoming"
    dest_bucket = "processed"

    for file_name in req:
        names.append(convert_push(source_bucket, dest_bucket, file_name, mc))

    print(json.dumps(names))

def convert_push(source_bucket, dest_bucket, file_name, mc):
    mc.fget_object(source_bucket, file_name, "/tmp/" + file_name)

    f = open("/tmp/" + file_name, "rb")
    input_image = f.read()

    # download file
    r = requests.post("http://gateway:8080/function/convertbw", input_image)

    # write to temporary file
    dest_file_name = get_temp_file()
    f = open("/tmp/" + dest_file_name, "wb")
    f.write(r.content)
    f.close()

    # sync to Minio
    mc.fput_object(dest_bucket, dest_file_name, "/tmp/"+dest_file_name)

    return dest_file_name

def get_temp_file():
    uuid_value = str(uuid.uuid4())
    return uuid_value

Build and deploy the new function using the new .yml file:

$ faas-cli build -f processimages.yml

# Remote clusters or multi-node
$ faas-cli push -f processimages.yml

$ faas-cli deploy -f processimages.yml

Now let's invoke the processimages function using the output of the previous function.

$ echo '["761ac85c-ec76-4114-9790-ce9fa2a772f9", "64d080ec-5685-44cc-9c70-5d5886823707"]' | faas invoke processimages

["6b70561c-35a4-410c-a216-5582c6da6afe", "2f3a9741-4e63-4d03-966c-c40813d084b9"]

The output from the ProcessImages function gives the new location in the processed bucket of the black and white photos.

You can copy them back to your computer using mc cp or view them with the Minio UI at localhost:9000 (or wherever you have set this up). For instance:

$ mc cp myminio/processed/6b70561c-35a4-410c-a216-5582c6da6afe .

In our instance if you add a .jpg extension to the file you should be able to open it in Finder / Windows Explorer etc.

Summing up

We now have two functions which form part of a pipe-line so that we can separate the downloading of files with the processing of them. Object storage is provided by Minio which means we can store the results of the functions easily and pass them along from one function to another by filename.

Grab the code from GitHub: alexellis/function-storage-example

This example is not a finished product, that's where you come in - so here are some ideas for enhancements you could make.

Make it yours:

  • Perform your own transformation or analysis on the files - perhaps use audio files or videos instead of plain JPEG images.
  • You could have the resulting processed images sent over to your Slack channel via a HTTP POST.

Make it secure:

Go "pro":

  • Copy your functions into a single .yml file instead of using one .yml file per function
  • Rename your .yml file to stack.yml so that you don't need to specify the -f argument
  • Open the Minio UI on http://localhost:9000 to browse the results of your pipeline
  • Create a "director function" which creates a workflow between the LoadImages and ProcessImages functions.

Get involved with OpenFaaS

Head over to GitHub or the project website to keep learning.

If you'd like to contribute please check out our contributing guide for all the ways you can help.