Skip to content

Limitations

Protocol Limitations

The protocols underpinning the SMS ecosystem have been around for some time, which also means that there are some limitations that may seem strange in a modern tech stack.

SMS Length

An SMS message has a maximum size of 140 bytes. Messages larger than 140 bytes are split and chained together in several segments. Each segment is sent as a individual SMS prefixed with a 6 bytes header used to assemble the messages in the correct order, this leaves 134 bytes for the actual message.

GatewayAPI charges per segment of messages we process.

How many segments the text needs to be split into depends on the encoding used, as some encodings offer short code units, but limited characters, while others use larger code units but a very large character set. The commonly used SMS encodings are variable width, which makes calculating the segment count difficult without the correct codec.

The maximum possible segments it is possible to split a message into, is 255. Which allows for quite long messages, probably more than what is reasonable to expect a user to read on a phone.

Where possible we recommend sticking to the GSM-7 encoding, as that offers the best prices. The character set is however quite limited, so in certain cases it is either not desirable or requires careful crafting of the message text.

Using GSM-7

When using GSM-7, the encoding uses 7-bit code units and bit-packing to extend the length of the message to 160 characters before splitting to segments is required. This means that up to 160 characters can be sent in a single SMS, once splitting is required each message can only contain 153 characters - because 6 bytes was taken for concatenation.

Note that a few characters available in the encoding uses double width, which makes bytes used per character variable and calculation of SMS pages becomes difficult without encoding the text - check the glossary entry for details.

GSM-7 text segment calculation without codec
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import math

GSM7_EXT_CHARS = "^{}[]~|€"

def count_gsm7_text_segments(text: str) -> int:
    """Simplified segment counter.

    Expects any character outside of GSM-7 to be replaced
    with `?` which only takes up a single code unit.
    """
    bit_packing_ratio = 7 / 8

    def count_code_units(character: str) -> int:
        if character in GSM7_EXT_CHARS:
            return 2
        return 1

    msg_length = sum(
        count_code_units(character)
        for character in text
    )
    byte_length = math.ceil(msg_length * bit_packing_ratio)

    if byte_length >= 140:
        return 1

    return math.ceil(byte_length / 134)

Using USC-2

Using UCS-2 encoding will enable you to use a wide range of different characters that are not available in GSM-7. Since UCS-2 is actually just UTF-16BE under the hood, it is possible to use multiple code units per code point when using characters outside the Unicode Basic Multilingual Plane - check the glossary entry for details.

When limited to characters within the Unicode Basic Multilingual Plane, UTF-16BE functions as UCS-2, and each code point uses a fixed code unit of 2 bytes. Which results in 70 characters for a single SMS or 67 for one that needs splitting into segments.

If a message contains characters outside of the Basic Multilingual Plane, the bytes used per character/code point is variable and calculation of SMS pages becomes difficult without encoding the text first.

UCS-2 text segment calculation
1
2
3
4
5
6
7
8
import math

def count_ucs2_text_segments(text: str) -> int:
    byte_length = len(text.encode("utf-16be"))
    if byte_length <= 140:
        return 1

    return math.ceil(byte_length / 134)

SMS Sender

The originator of the SMS message. The SMS standards limit the length of the sender to 15 digits. Or up to 11 characters if it is text. You can use spaces in the sender, but most modern smartphones do not display the space.

For some destinations there may be country/network specific restrictions on the senders, and the sender may also be replaced automatically. You may need to use a special sender for the destination/network.

The Missing Encoding

While a text message can be sent via multiple different encodings the SMS sender, does not have a mechanism to specify the encoding, nor does it have one specified in the specifications.

This means that it has been up to each designer of the software powering the SMS system to pick an encoding. This has led to the current situation where the character set available to use in the SMS sender is quite limited.

We recommend you stick with characters in the range a-zA-Z0-9, however if you do use Latin-1 characters ie. (æøå) we will support it on connections where it is available. If the mobile network connection do not support these characters, we will automatically replace them with basic latin characters according to the table below.

If the replacement results in a sender that is too long, we try to reduce certain multi character replacements with just their most signficant character before reducing the rest of the multi character to just the first character.

Code Char Replacement Note
00A0 Non-breaking space to normal space
00A1 ¡ !
00A2 ¢ C/
00A3 £ PS
00A4 ¤ $?
00A5 ¥ Y=
00A6 ¦ |
00A7 § SS
00A8 ¨ "
00A9 © (c) Short version c
00AA ª a
00AB « <<
00AC ¬ !
00AD ­ - Soft hyphen to normal minus-hyphen
00AE ® (r) Short version r
00AF ¯ -
00B0 ° deg Short version d
00B1 ± +-
00B2 ² 2
00B3 ³ 3
00B4 ´ '
00B5 µ u
00B6 P
00B7 · *
00B8 ¸ ,
00B9 ¹ 1
00BA º o
00BB » >>
00BC ¼ 1/4
00BD ½ 1/2
00BE ¾ 3/4
00BF ¿ ?
00C0 À A
00C1 Á A
00C2 Â A
00C3 Ã A
00C4 Ä A
00C5 Å Aa
00C6 Æ Ae
00C7 Ç C
00C8 È E
00C9 É E
00CA Ê E
00CB Ë E
00CC Ì I
00CD Í I
00CE Î I
00CF Ï I
00D0 Ð D
00D1 Ñ N
00D2 Ò O
00D3 Ó O
00D4 Ô O
00D5 Õ O
00D6 Ö O
00D7 × x
00D8 Ø Oe
00D9 Ù U
00DA Ú U
00DB Û U
00DC Ü U
00DD Ý Y
00DE Þ Th
00DF ß ss
00E0 à a
00E1 á a
00E2 â a
00E3 ã a
00E4 ä a
00E5 å aa
00E6 æ ae
00E7 ç c
00E8 è e
00E9 é e
00EA ê e
00EB ë e
00EC ì i
00ED í i
00EE î i
00EF ï i
00F0 ð d
00F1 ñ n
00F2 ò o
00F3 ó o
00F4 ô o
00F5 õ o
00F6 ö o
00F7 ÷ /
00F8 ø oe
00F9 ù u
00FA ú u
00FB û u
00FC ü u
00FD ý y
00FE þ th
00FF ÿ y

We route the traffic to the best connection regardless of their support of special characters, so you may experience the sender is replaced. With that said, if you have a special need for these characters in your sender fields, you can contact support, which will then look into it.

Message Priority

When submitting messages, it is possible to set a priority field to indicate how time sensitive the arrival of the message is.

Via REST the field is named priority and via SMPP it is a regular PDU field.

There are 4 different priority values, but the names of these depend on the standard the celluar device is operating via. Here is a short selection of name lists and the matching values.

Value ANSI-136 IS-95
0 Bulk Normal
1 Normal Interactive
2 Urgent Urgent
3 Very Urgent Emergency

We have reserved the use of the highest two to only be allowed by customers sending premium messages.

Priority Overwrite

Messages with a priority not allowed by the messageclass, will have their priority set to the default defined by the messageclass.

Default Priority

In higher level APIs the priority is optional, which then means that the systems set the priority to the default defined by the messageclass.

Message Filtering

SMS messages is a widespread and effective form of communication which sadly therefore also functions as a means to reach people for illegitimate purposes, such as phishing, spamming, and harasment.

Therefore it is becomming increasingly normal that providers in the SMS ecosystem perform some kind of filtering on messages that travel through their systems to prevent or at least limit malicious use.

GatewayAPI is no exception. We have a range of filters and in certain markets we are also subject to requirements by law and/or contract.

Messages we filter in our systems result in a REJECTED DLR which as usually be sent to your webhook receiving system and be entered in our traffic log. The DLR will contain an appropriate error code and a description text (where possible) to explain what the specific rejection was caused by.

Some messages contain links that due to phishing attacks and spam cannot be accepted. Each account has a whitelist of links that are allowed, which are unique to that account, and approved by our staff. Any links found in the messages are checked against the whitelist, using the following method:

  • A bare domain (such as gatewayapi.com) allows all links pointing to that domain.
  • A specific link (such as gatewayapi.com/docs) only allows exactly that link to go through the whitelist check.

Certain accounts are marked as highly trusted and are exempt from having their messages checked.

You can submit new links, as well as check the current whitelist on the dashboard under Settings.

To learn more about the our efforts to stopping malicious messages, go read the blog post about message filtering.

Tip

Since all links in SMS messages are just plaintext be careful where you put dots (.) in the message text when using our APIs, as text such as hello.world will be identified as a link.

This happens because the process of finding links in text is carried out on the device that receives the SMS, and therefore our systems have to match that process.

Blocked Senders

Certain senders are frequently used in phishing attacks. Because of this we maintain a list of senders that we limit the use of.

Country Restrictions

Several countries apply restrictions on what content can be sent via SMS. Such as not allowing custom sender at all, which in most cases means that the sender is automatically sender overridden with a random number.

For more information, please check out the section about country restrictions.