Flag of Ukraine

Merge video, audio, images into one video

The /video/merge Robot composes a new video by adding an audio track to existing still image(s) or video.

This Robot is able to generate a video from:

  • An image and an audio file
  • A video and an audio file
  • Several images

Merging an audio file and an image

To merge an audio file and an image to create a video please pass both the audio file and the image to an Assembly Step via the use parameter. For this to work, you just need to use the as-syntax:

"merged": {
  "robot": "/video/merge",
  "preset": "ipad-high",
  "use": {
    "steps": [
      { "name": ":original", "as": "audio" },
      { "name": ":original", "as": "image" }
    ],
    "bundle_steps": true
  }
}

Imagine you have uploaded both an image and an audio file in the same upload form. In the above example the system will automatically recognize the files properly if you use the same Step name twice (":original" in this case). Instead of using :original you could also use any other valid Assembly Step name.

If you are using multiple file input fields, then you can tell Transloadit which field supplies the audio file and which supplies the image. Suppose you have two file input fields named the_image and the_audio. These Assembly Instructions will make it work:

"merged": {
  "robot": "/video/merge",
  "preset": "ipad-high",
  "use": {
    "steps": [
      { "name": ":original", "fields": "the_audio", "as": "audio" },
      { "name": ":original", "fields": "the_image", "as": "image" }
    ],
    "bundle_steps": true
  }
}

Merging an audio file and a video

If you have a video file (without sound for example) and an audio track that you want the video to use you can merge them together with Transloadit. Just label the video as video and the audio file as audio using the as key in the JSON.

Imagine you have two file input fields in the same upload form - one to upload a video file and one for an audio file. You can tell Transloadit which field supplies the video and which the audio file using the file input field's name attribute. Just use the value for the name attribute as the value for the fields key in the JSON:

"merged": {
  "robot": "/video/merge",
  "preset": "ipad-high",
  "use": {
    "steps": [
      { "name": ":original", "fields": "the_video", "as": "video" },
      { "name": ":original", "fields": "the_audio", "as": "audio" }
    ],
    "bundle_steps": true
  }
}

You can also supply the video and audio file using other Assembly Steps of course and leave out the fields attribute.

Warning: When merging audio and video files, it's recommended to set target a format & codecs via a preset or via ffmpeg.codec:v, ffmpeg.codec:a and ffmpeg.f. If not, merging will default to backwards compatible, but non-desirable legacy codecs.

Merging several images to generate a video

It is possible to create a video from images with Transloadit. Just label all images as image using the as key in the JSON:

"merged": {
  "robot": "/video/merge",
  "preset": "ipad-high",
  "use": {
    "steps": [
      { "name": ":original", "as": "image" }
    ],
    "bundle_steps": true
  },
  "framerate": "1/10",
  "duration": 8.5
}

This will work fine in a multi-file upload context. Files are sorted by their basename. So if you name them 01.jpeg and 02.jpeg, they will be merged in the correct order.

You can also supply your images using other Assembly Steps of course, results from /image/resize Steps for example.

Parameters

  • use

    String / Array of Strings / Object required

    Specifies which Step(s) to use as input.

    • You can pick any names for Steps except ":original" (reserved for user uploads handled by Transloadit)

    • You can provide several Steps as input with arrays:

      "use": [
        ":original",
        "encoded",
        "resized"
      ]
      

    :bulb: That’s likely all you need to know about use, but you can view advanced use cases:

    › Advanced use cases
    • Step bundling. Some Robots can gather several Step results for a single invocation. For example, the /file/compress Robot would normally create one archive for each file passed to it. If you'd set bundle_steps to true, however, it will create one archive containing all the result files from all Steps you give it. To enable bundling, provide an object like the one below to the use parameter:

      "use": {
        "steps": [
          ":original",
          "encoded",
          "resized"
        ],
        "bundle_steps": true
      }
      

      This is also a crucial parameter for the /video/adaptive Robot, otherwise you'll generate 1 playlist for each viewing quality.
      Keep in mind that all input Steps must be present in your Template. If one of them is missing (for instance it is rejected by a filter), no result is generated because the Robot waits indefinitely for all input Steps to be finished.

      Here’s a demo that showcases Step bundling.

    • Group by original. Sticking with the /file/compress Robot example, you can set group_by_original to true, in order to create a separate archive for each of your uploaded or imported files, instead of creating one archive containing all originals (or one per resulting file). This is important for for the /media/playlist Robot where you'd typically set:

      "use": {
        "steps": [
          "segmented"
        ],
        "bundle_steps": true,
        "group_by_original": true
      }
      
    • Fields. You can be more discriminatory by only using files that match a field name by setting the fields property. When this array is specified, the corresponding Step will only be executed for files submitted through one of the given field names, which correspond with the strings in the name attribute of the HTML file input field tag for instance. When using a back-end SDK, it corresponds with myFieldName1 in e.g.: $transloadit->addFile('myFieldName1', './chameleon.jpg').

      This parameter is set to true by default, meaning all fields are accepted.

      Example:

      "use": {
        "steps": [ ":original" ],
        "fields": [ "myFieldName1" ]
      }
      
    • Use as. Sometimes Robots take several inputs. For instance, the /video/merge Robot can create a slideshow from audio and images. You can map different Steps to the appropriate inputs.

      Example:

      "use": {
        "steps": [
          { "name": "audio_encoded", "as": "audio" },
          { "name": "images_resized", "as": "image" }
        ]
      }
      

      Sometimes the ordering is important, for instance, with our concat Robots. In these cases, you can add an index that starts at 1. You can also optionally filter by the multipart field name. Like in this example, where all files are coming from the same source (end-user uploads), but with different <input> names:

      Example:

      "use": {
        "steps": [
          { "name": ":original", "fields": "myFirstVideo", "as": "video_1" },
          { "name": ":original", "fields": "mySecondVideo", "as": "video_2" },
          { "name": ":original", "fields": "myThirdVideo", "as": "video_3" }
        ]
      }
      

      For times when it is not apparent where we should put the file, you can use Assembly Variables to be specific. For instance, you may want to pass a text file to the /image/resize Robot to burn the text in an image, but you are burning multiple texts, so where do we put the text file? We specify it via ${use.text_1}, to indicate the first text file that was passed.

      Example:

      "watermarked": {
        "robot": "/image/resize",
        "use"  : {
          "steps": [
            { "name": "resized", "as": "base" },
            { "name": "transcribed", "as": "text" },
          ],
        },
        "text": [
          {
            "text"  : "Hi there",
            "valign": "top",
            "align" : "left",
          },
          {
            "text"    : "From the 'transcribed' Step: ${use.text_1}",
            "valign"  : "bottom",
            "align"   : "right",
            "x_offset": 16,
            "y_offset": -10,
          }
        ]
      }
      
  • preset

    String ⋅ default: "flash"

    Generates the video according to pre-configured video presets.

    If you specify your own FFmpeg parameters using the Robot's ffmpeg parameter and you have not specified a preset, then the default "flash" preset is not applied. This is to prevent you from having to override each of the flash preset's values manually.

  • width

    Integer(1-1920) ⋅ default: Width of the input video

    Width of the new video, in pixels.

    If the value is not specified and the preset parameter is available, the preset's supplied width will be implemented.

  • height

    Integer(1-1080) ⋅ default: Height of the input video

    Height of the new video, in pixels.

    If the value is not specified and the preset parameter is available, the preset's supplied height will be implemented.

  • resize_strategy

    String ⋅ default: "pad"

    If the given width/height parameters are bigger than the input image's dimensions, then the resize_strategy determines how the image will be resized to match the provided width/height. See the available resize strategies.

  • background

    String ⋅ default: "00000000"

    The background color of the resulting video the "rrggbbaa" format (red, green, blue, alpha) when used with the "pad" resize strategy. The default color is black.

  • framerate

    String ⋅ default: "1/5"

    When merging images to generate a video this is the input framerate. A value of "1/5" means each image is given 5 seconds before the next frame appears (the inverse of a framerate of "5"). Likewise for "1/10", "1/20", etc. A value of "5" means there are 5 frames per second.

  • image_durations

    Array of Floats ⋅ default: []

    When merging images to generate a video this allows you to define how long (in seconds) each image will be shown inside of the video. So if you pass 3 images and define [2.4, 5.6, 9] the first image will be shown for 2.4s, the second image for 5.6s and the last one for 9s. The duration parameter will automatically be set to the sum of the image_durations, so 17 in our example. It can still be overwritten, though, in which case the last image will be shown until the defined duration is reached.

  • duration

    Float ⋅ default: 5.0

    When merging images to generate a video or when merging audio and video this is the desired target duration in seconds. The float value can take one decimal digit. If you want all images to be displayed exactly once, then you can set the duration according to this formula: duration = numberOfImages / framerate. This also works for the inverse framerate values like 1/5.

    If you set this value to null (default), then the duration of the input audio file will be used when merging images with an audio file.

    When merging audio files and video files, the duration of the longest video or audio file is used by default.

  • audio_delay

    Float ⋅ default: 0.0

    When merging a video and an audio file, and when merging images and an audio file to generate a video, this is the desired delay in seconds for the audio file to start playing. Imagine you merge a video file without sound and an audio file, but you wish the audio to start playing after 5 seconds and not immediately, then this is the parameter to use.

FFmpeg parameters

  • ffmpeg_stack

    String ⋅ default: "v3.3.3"

    Selects the FFmpeg stack version to use for encoding. These versions reflect real FFmpeg versions.

    The current recommendation is to use "v4.3.1". Other valid values can be found here.

  • ffmpeg

    Object ⋅ default: {}

    A parameter object to be passed to FFmpeg. For available options, see the FFmpeg documentation. If a preset is used, the options specified are merged on top of the ones from the preset.

Demos

Related blog posts

Uppy
20% off any plan for the Uppy community
Use the UPPY20 code when upgrading.
Sign up
tus
20% off any plan for the tus community
Use the TUS20 code when upgrading.
Sign up
Product Hunt
20% off any plan for Product Hunters
Use the PRH20 code when upgrading.
Sign up