Chapter 26. Datamatrix (2D-Barcode)

Table of Contents

26.1. Principle of Datamatrix Barcodes
26.1.1. Summary of features offered in the library
26.1.2. Limitation of the JpGraph Datamatrix implementation
26.1.3. Datamatrix standard
26.1.4. Structure of Data Matrix codes
26.1.5. Encodation efficiency
26.1.6. More on ECC Datamatrix subsets
26.1.7. Symbology Data capacity
26.2. Creating barcodes
26.2.1. Getting started
26.2.2. Error handling
26.2.3. Encodation options
26.2.4. Processing special input characters
26.2.5. Creating different backends
26.2.6. Generic backend methods
26.2.7. Image backend methods
26.2.8. Postscript backend format options
26.2.9. A template to create barcodes
26.2.10. Sample application
26.3. Example script
26.3.1. Example 1 - Setting the shape
26.3.2. Example 2 - Writing to a file
26.3.3. Example 3 - Creating postscript output
26.3.4. Example 4 - Changing background color

Principle of Datamatrix Barcodes

Datamatrix (or Data Matrix) is a high density 2 dimensional barcode that can encode up to 3116 characters from the entire 256 byte ASCII character set. Compared with DF417 barcode symbology the datamatrix barcode belongs to newer family of 2 dimensional barcodes that makes better use of both dimensions and thus can achieve higher data capacity than the PDF417 symbology (~3kB vs ~2kB). The symbol is built on a square grid which have a finder pattern around the edges of the symbol to allow a scanner to identify the barcode. The finder pattern makes it possible to read the barcode regardless of the physical orientation of the code.

In the same way as with other 2 dimensional barcodes the datamatrix code includes error correction capability in order to be resilient towards physical damages of a code. Originally data matrix used an older convolutional error correction schema (ECC) but that has later been changed to use a Reed-Solomon type of error correction which is much more efficient. The older ECC version is known as ECC 000 to ECC 140 and should be considered obsolete and should not be used in new applications.

The newer error correction schema (with Reed-Solomon codes) is known as ECC 200 schema and is the current and recommended schema. By default the library will use the newer schema but support also exists for legacy applications to use the older ECC schema. (See ??)

Figure 26.1. Datamatrix structure. shows the principle of a Datamatrix barcode.

Figure 26.1. Datamatrix structure.

The image shows an annotated Datamatrix where the finder and synchronization patterns have been highlighted.

Datamatrix structure.

Even though it is primarily designed to handle the the Western alphabet (ISO-8859/x code tables) it will support user prepared Unicode characters through the use of the "Extended Channel Interpretation" (ECI) mechanism. However description of the ECI standard is out of scope for this manual and the interested reader are referred to the official ECI standard document.

Datamatrix standard has been adopted by (among others) "The American National Standards Institute" (ANSI) as a standard symbology and a number of industry standard associations (e.g. EIA, SEMI, AIAG, ATA) where it has been recommended for use.

Summary of features offered in the library

The following list summarizes the features that the library offers for Datamatrix barcodes. Some of the terms used here assumes familiarity with Datamatrix barcodes. All terms are also described in the remainder of this chapter.

  • Supports both the new ECC 200 variant and the older ECC 140

  • Output formats

    1. Image

    2. Postscript

    3. ASCII

  • Supports all recommended encodation formats

    1. ASCII

    2. C40

    3. BASE256

    4. Text

    5. X12

  • Supports all specified symbol sizes

  • Supports both auto and user selectable encodation

  • Supports both auto and user selectable symbol size

  • Supports user specified module size

  • Supports custom color specification (foreground, background)

  • Supports user specified quiet zone

  • Supports easy handling of non-printable characters through the use of special escape sequences ("Tilde" - processing)

  • Supports concatenated symbols

  • Symbols can be written directly to a file or sent back as an image to the browser

Limitation of the JpGraph Datamatrix implementation

This version of the library does not support the EDIFACT compaction standard due to the very specialized and limited use of this encodation schema.

Datamatrix standard

Datamatrix as a standard is fully described in the ISO/IEC 16022E International Standard and is available for purchase from the ISO Standard Organization.

Additional information about Data Matrix code is available in the following United States patents: 4,939,354; 5,053,609; 5,124,536. See US patent Office for full disclosures of these patents.

Structure of Data Matrix codes

Datamatrix is a two-dimensional symbology in the shape of a rectangle. The size and shape of the symbol is usually chosen either automatically or by the user. Usually it is chosen to be the smallest size that will have enough data capacity to encode the given data. The symbol rectangle is build up by square dots whose size "the module" is also user specified.

The Data Matrix symbol rectangle comes in two basic shapes.

  1. It is either a square between the sizes of 10x10 up to 144x144 modules in even steps

  2. It is a rectangle between the size of 8x16 up to 16x48

Examples of the two basic shapes are shown in Figure 26.2. Datamatrix - Square symbol shape and Figure 26.3. Datamatrix - Rectangle symbol shape

Figure 26.2. Datamatrix - Square symbol shape

Datamatrix - Square symbol shape

Figure 26.3. Datamatrix - Rectangle symbol shape

Datamatrix - Rectangle symbol shape

The maximum capacity for Data Matrix codes is up to 3116 numeric characters or up to 2335 alphanumeric characters or up to 1555 bytes of binary information.

The exact number of characters that can fit in a Data Matrix symbol depends on the actual encoding (or compaction) schema used. In short this is used to more efficiently encode ASCII characters to fit more data into a fixed number of bytes. For example if only numeric data is to be encoded then instead of using one byte to hold each digit two digits is stored in a single byte hence doubling the amount of data that can be stored in a given number of bytes.

To encode data into a Datamatrix symbol the following (principal) steps are taken.

  1. The input string (which can be any ASCII values between 0-255) is encoded using the selected encoding or encodings (it is possible to switch encoding mid-way through the string). The primary purpose of the encoding is to compress the data into a much shorter form.

  2. If needed the data is padded to fill up to the capacity of the selected symbol size.

  3. Once the string has been encoded (and possible padded) a number of error correcting code words are added so that the data can be recovered even if part of the printed symbol have been destroyed (perhaps a corner has been teared off).

  4. Finally the encoded data and the error correcting words are placed in the symbol according to an algorithm specified in the standard. This is done by placing each bit of every data byte in a specific position in the data matrix symbol.

The above explanation is by necessity simplified and for those interested into the specific details we refer to the official standard. It is also possible to review the code itself to understand the details.

Encodation efficiency

As explained in the previous section several compaction schema are used to encode the data to enable more data to fit in a given symbol. Depending on the actual data there are several compaction schema that can be used in order to achieve the greatest possible compression. The standard specifies six different schema. The compaction efficiency are given in Table 26.1. Datamatrix encodation efficiency.

Depending on the application the user of the library may chose to either select a fixed encodation mode but it is usually best to let the library automatically select a combination of encodation schema that will give the smallest possible symbol size.

Table 26.1. Datamatrix encodation efficiency

Encodation schema


Bits per character


Double digit numerics

ASCII 0-127

Extended ASCII 128-255




C40Primarily upper-case alphanumeric5.33
TextPrimarily lower-case alphanumeric5.33
X12ANSI X12 EDI data set5.33
EDIFACTASCII values 32-946
Base 256All byte values 0-2558

More on ECC Datamatrix subsets

As was mentioned in the introduction there are two main subsets of Datamatrix symbols. Those using convolutional codes for error correction which were used for most of the initial installations of Datamatrix systems, these earlier versions are referenced as ECC-000 to ECC-140 (the number specifies the level of convolutional error correcting code).

This first subset will be commonly referred to as ECC-140 in the remainder of this manual.

The second subset is referenced ECC-200 and uses Reed-Solomon error correction techniques. The two subsets have the following characteristic:

  1. ECC-000 to ECC-140 symbols all have an odd number of modules along each square side.

  2. ECC-200 symbols have an even number of modules on each side. ECC-200 can have non-square symbol sizes.

Hence the type of encoding used is auto-discriminative. The maximum data capacity of an ECC-200 symbol is 3116 numeric digits, or 2335 alpha numeric characters, in the largest 144 modules square symbol.

Even though the library supports the creation of both type of Datamatrix symbols it is recommended that all new applications uses the more modern ECC-200 subset. This is also the recommendation in the standard. ECC-140 should only be used in legacy system where old equipment is used which have not be upgraded to handle the modern ECC-200 subset.

Symbology Data capacity

As was mentioned in the previous section the actual data capacity depends on the symbol size. By default the library will select the smallest possible symbol size that will encode a given character string with the chosen encoding (possibly automatic). Table 2 below gives the maximum capacity for the three most common encoding schema for each symbol size as well as robustness in each symbol specified as the number of errors (destroyed data) that can be recovered.

Table 26.2. Maximum data capacity for the different symbol sizes in ECC-200 Data Matrix subset.
Size Numeric capacity Alphanumeric capacity Binary capacity Max Correctable Error/Erasure
10 x 10 6 3 1 2
12 x 12 10 6 3 3
14 x 14 16 10 6 5/7
16 x 16 24 16 10 6/9
18 x 18 36 25 16 7/11
20 x 20 44 31 20 9/15
22 x 22 60 43 28 10/17
24 x 24 72 52 34 12/21
26 x 26 88 64 42 14/25
32 x 32 124 91 60 18/33
36 x 36 172 127 84 21/39
40 x 40 228 169 112 24/45
44 x 44 288 214 142 28/53
48 x 48 348 259 172 34/65
52 x 52 408 304 202 42/78
64 x 64 560 418 278 56/106
72 x 72 736 550 366 72/132
80 x 80 912 682 454 96/180
88 x 88 1152 862 574 112/212
96 x 96 1392 1042 694 136/260
104 x 104 1632 1222 814 168/318
120 x 120 2100 1573 1048 204/390
132 x 132 2608 1954 1302 248/472
144 x 144 3116 2335 1556 310/590
8 x 18 10 6 3 3
8 x 32 20 13 8 5
12 x 26 32 22 14 7/11
12 x 36 44 31 20 9/15
16 x 36 64 46 30 12/21
16 x 48 98 72 47 14/25