Using PassThrough Streams for Uploading to S3

Dimly lit tunnel with a straight road and overhead lights.

Using PassThrough Streams for Uploading to S3

Learn how to efficiently upload files to Amazon S3 using Node.js passthrough streams. This guide explores how passthrough streams can help manage memory usage and improve performance when handling file uploads, with practical examples and code snippets.

There are times when you might want to download a file and upload it immediately to S3. Without having it save to your local disk first. This is a time consuming operation; first downloading and then uploading. Why not do both at the same time?

In this particular use case, we needed to download a file from a partner and then synchronize it into a database. Uploading to S3 allowed us to establish a record of when that file came in for auditing purposes. It also allowed us to setup a lifecycle rule to age these files to glacier. The files in question are 1–2GB, which would have ran us out of disk space in concurrent operations. This solidified why we did not want to download the entire file to disk and upload.

Using a PassThrough stream allows us to write and read at the same time from the same stream. Now we only leveraging our network and memory during this phase. Let’s take a look at the code!

'use strict';

const { Curl, CurlFeature } = require('node-libcurl');
const { PassThrough } = require('stream');
const { S3 } = require('aws-sdk');

const url = 'https://host/path/to/file.ext';
const bucket = 'mybucket';
const s3Key = 'file.ext';

let uploadStarted = false;
let uploadSize = 0;
const stream = new PassThrough();
const download = new Curl();
download.setOpt(Curl.option.URL, url);
download.setOpt(Curl.option.WRITEFUNCTION, (buffer, size, nmemb) => {
  stream.write(buffer);
  if (!uploadStarted) {
    console.log('Streaming Upload for: ' + url);
    uploadStarted = true;
    s3.putObject(
      {
        Bucket: bucket,
        Key: s3Key,
        Body: stream,
        ContentLength: +uploadSize
      },
      (err, data) => {
        if (err) {
          return reject(err);
        }
        resolve();
      }
    );
  }
  return size * nmemb;
});
download.enable(CurlFeature.Raw | CurlFeature.NoStorage);
download.on('header', (chunk, curlInstance) => {
  chunk = chunk.toString();
  if (/^Content-Length/i.test(chunk)) {
    uploadSize = chunk.split(': ')[1]

You have likely noticed that there are some tricky bits in here. So let’s take a quick moment to explain them.

First off, we are leveraging libcurl through the node-libcurl library. This provides features such as CurlFeature.RAW and CurlFeature.NoStorage. RAW provides the binary data and NoStorage is letting the curl client know we do not want to store anything.

Next is that for S3, we need to state the content size when sending a stream of content. HTTP headers always come before the content. Thus, we can determine upload sizing using the Content-Length header.

Utilizing the pass-through stream, we write as we receive data from curl which writes the file to S3. You now have a simple system to download and upload to S3 without storing the file on disk. It still goes through memory and leverages the network; but no storage is necessary.

Three computer monitors displaying blue binary code with a glowing digital sphere and city skyline in the background.

Custom Software

React & Expo: A journey of uploading raw image data to S3

Custom Software

React & Expo: A journey of uploading raw image data to S3

Custom Software

React & Expo: A journey of uploading raw image data to S3

A cartoon illustration of a police officer in uniform holding handcuffs next to a large blue sphere covered in binary code (1s and 0s).

Custom Software

Frictionless Serverless Development: Part 1 — Setting Up Your Environment

Custom Software

Frictionless Serverless Development: Part 1 — Setting Up Your Environment

Custom Software

Frictionless Serverless Development: Part 1 — Setting Up Your Environment

Space shuttle on launch pad at night, illuminated by lights on the service structure.

Custom Software

Frictionless Serverless Development: Part 2 — Express, Configuration, Integration and Deployment

Custom Software

Frictionless Serverless Development: Part 2 — Express, Configuration, Integration and Deployment

Custom Software

Frictionless Serverless Development: Part 2 — Express, Configuration, Integration and Deployment

hello@thesparklaboratory.com

The best ideas don't wait. Let's talk & make it happen.