Extract thumbnail images from documents

🤖/document/thumbs generates an image for each page in a PDF file or an animated GIF file that loops through all pages.

Things to keep in mind

If you convert a multi-page PDF file into several images, all result images will be sorted with the first image being the thumbnail of the first document page, etc.
You can also check the meta.thumb_index key of each result image to find out which page it corresponds to. Keep in mind that these thumb indices start at 0, not at 1.

Usage example

Convert all pages of a PDF document into separate 200px-wide images:

{
  "steps": {
    "thumbnailed": {
      "use": ":original",
      "robot": "/document/thumbs",
      "width": 200,
      "resize_strategy": "fit",
      "trim_whitespace": false,
      "imagemagick_stack": "v3.0.0"
    }
  }
}

Parameters

use

String / Array of Strings / Object required
Specifies which Step(s) to use as input.
- You can pick any names for Steps except ":original" (reserved for user uploads handled by Transloadit)
- You can provide several Steps as input with arrays:
```
"use": [
  ":original",
  "encoded",
  "resized"
]
```
💡 That’s likely all you need to know about use, but you can view Advanced use cases.
page

Integer / Null ⋅ default: null

The PDF page that you want to convert to an image. By default the value is null which means that all pages will be converted into images.
format

String ⋅ default: "png"

The format of the extracted image(s). Supported values are "jpeg", "jpg", "gif" and "png".

If you specify the value "gif", then an animated gif cycling through all pages is created. Please check out this demo to learn more about this.
delay

Integer / Null ⋅ default: null

If your output format is "gif" then this parameter sets the number of 100th seconds to pass before the next frame is shown in the animation. Set this to 100 for example to allow 1 second to pass between the frames of the animated gif.

If your output format is not "gif", then this parameter does not have any effect.
width

Integer(1-5000) ⋅ default: auto

Width of the new image, in pixels. If not specified, will default to the width of the input image
height

Integer(1-5000) ⋅ default: auto

Height of the new image, in pixels. If not specified, will default to the height of the input image
resize_strategy

String ⋅ default: "pad"

One of the available resize strategies.
background

String ⋅ default: "#FFFFFF"

Either the hexadecimal code or name of the color used to fill the background (only used for the pad resize strategy).

By default, the background of transparent images is changed to white. For details about how to preserve transparency across all image types, see this demo.
alpha

String ⋅ default: ""

Change how the alpha channel of the resulting image should work. Valid values are "Set" to enable transparency and "Remove" to remove transparency.

For a list of all valid values please check the ImageMagick documentation here.
density

String / Null ⋅ default: null

While in-memory quality and file format depth specifies the color resolution, the density of an image is the spatial (space) resolution of the image. That is the density (in pixels per inch) of an image and defines how far apart (or how big) the individual pixels are. It defines the size of the image in real world terms when displayed on devices or printed.

You can set this value to a specific width or in the format widthxheight.

If your converted image has a low resolution, please try using the density parameter to resolve that.
antialiasing

Boolean ⋅ default: false

Controls whether or not antialiasing is used to remove jagged edges from text or images in a document.
colorspace

String ⋅ default: ""

Sets the image colorspace. For details about the available values, see the ImageMagick documentation.

Please note that if you were using "RGB", we recommend using "sRGB". ImageMagick might try to find the most efficient colorspace based on the color of an image, and default to e.g. "Gray". To force colors, you might then have to use this parameter.
trim_whitespace

Boolean ⋅ default: true

This determines if additional whitespace around the PDF should first be trimmed away before it is converted to an image. If you set this to true only the real PDF page contents will be shown in the image.

If you need to reflect the PDF's dimensions in your image, it is generally a good idea to set this to false.
pdf_use_cropbox

Boolean ⋅ default: true

Some PDF documents lie about their dimensions. For instance they'll say they are landscape, but when opened in decent Desktop readers, it's really in portrait mode. This can happen if the document has a cropbox defined. When this option is enabled (by default), the cropbox is leading in determining the dimensions of the resulting thumbnails.
output_meta

Object / Boolean ⋅ default: {}

Generally, this parameter allows you to specify a set of metadata that is more expensive on cpu power to calculate, and thus is disabled by default to keep your Assemblies processing fast.

This Robot only supports the default value of {} (meaning all meta data will be extracted) and false. A value of false means that only width, height, size and thumb_index will be extracted for the result images, which would also provide a great performance boost for documents with many pages.

ImageMagick parameters

imagemagick_stack

String ⋅ default: "v2.0.9"

Selects the ImageMagick stack version to use for encoding. These versions do not reflect any real ImageMagick versions, they reflect our own internal (non-semantic) versioning for our custom ImageMagick builds. We currently recommend to use "v3.0.0".

Supported values: "v2.0.9", "v3.0.0".

A full comparison of supported formats, per stack, can be found here.

Extract thumbnail images from documents

Things to keep in mind

Usage example

Parameters

`use`

`page`

`format`

`delay`

`width`

`height`

`resize_strategy`

`background`

`alpha`

`density`

`antialiasing`

`colorspace`

`trim_whitespace`

`pdf_use_cropbox`

`output_meta`

ImageMagick parameters

`imagemagick_stack`

Demos

Related blog posts