Optimization techniques for SMS languages
When sending messages in languages that use longer sentences and/or have longer alphabets, the length of the message might increase significantly. So, these long messages might be split into multiple messages, which could increase the delivery costs.
GSM alphabet versus Unicode
The GSM alphabet (GSM-7) is a 7-bit alphabet and is the standard encoding for GSM. If an SMS is encoded in GSM alphabet, the maximum number of characters allowed in the message is 160.
If an SMS is encoded in Unicode, the maximum number of characters allowed in the message is 70.
The following image shows a list of allowed characters in the GSM alphabet. The basic character set is displayed on the left and the basic character set extension is displayed on the right.
In your SMS, if you include even a single character that is not supported by GSM alphabet, all characters in the message are encoded in Unicode. So, the maximum number of characters reduces to 70 per message.
If the message is longer than 70 characters, it is split into multiple messages. Each subsequent message is limited to 70 characters, even if it contains characters from the GSM alphabet.
If you need to use special characters, which are not part of the GSM alphabet, use the following options. These options increase the character capacity of your SMS such that it is closer to standard SMS sizes.
- National Language Identifier (NLI)
- SMS transliteration
National Language Identifier
National Language Identifier (NLI) is an encoding technology that enables an SMS that contains language-specific characters to be delivered as the original text. Without NLI, these language-specific characters are treated as 16-bit Unicode.
NLI deducts only 5 characters from the maximum SMS length so that the SMS now contains a maximum of 155 characters.
The remaining 5 characters are used in the background to inform the receiver’s device about the selected language and instruct it how to properly display the SMS on the device.
By sending a fully-featured textual message and setting the languageCode parameter, you can send language-specific characters.
Nonstandard characters might cause messages to be encoded in Unicode, which might reduce the number of available characters in the message. Use the SMS preview to explore all options before sending the message.
SMS language list for National Language Identifier
Supported languages and their codes are:
- Turkish (TR)
- Spanish (ES)
- Portuguese (PT)
- AUTODETECT
Some networks might not support the Language feature. This functionality might not work for all destinations. Example: A message that is Turkish and is sent through a Chinese provider might not be displayed correctly on the recipient’s device.
The following images show the list of supported characters for each of the supported languages:
Turkish
Portuguese
Spanish
Example
The following example shows an SMS message that contains the Turkish alphabet:
POST /sms/1/text/advanced HTTP/1.1
Host: api.infobip.com
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Content-Type: application/json
{
"messages":[
{
"from":"InfoSMS",
"destinations":[
{
"to":"41793026727"
}
],
"text":"Artık Ulusal Dil Tanımlayıcısı ile Türkçe karakterli smslerinizi rahatlıkla iletebilirsiniz.",
"language":{
"languageCode":"TR"
}
}
]
}
SMS transliteration
SMS transliteration (opens in a new tab) is the method of replacing special (unsupported) characters with similar or related characters that are part of the GSM alphabet.
This process ensures that a maximum of 160 characters can be used in a message instead of only 70 (because of the different encoding standards). The disadvantage of SMS transliteration is that the delivered message looks slightly different from the original message.
Using SMS transliteration, you can send messages in your preferred alphabet. These alphabets are automatically converted into an appropriately transliterated script. Thus, you can use the full capacity of the message text without sending any Unicode characters.
Transliteration might cause unexpected output message text. Use SMS preview to explore all options before sending the message.
SMS language list for transliteration
- Turkish
- Greek
- Cyrillic
- Serbian Cyrillic
- Bulgarian Cyrillic
- Central European
- Portuguese
- Colombian
- Baltic
- NON_UNICODE
By specifying the desired output alphabet, some unsupported characters might be converted differently, depending on which character is the most appropriate for the selected language.
Any character that is not recognized by the selected language and is not part of the GSM alphabet, is replaced by a dot (.).
If you use NON_UNICODE transliteration, the message text is converted from Unicode to GSM alphabet using all available alphabet conversions. Unmatched characters are replaced with dots.
Example:
Original text: "©™ø- ˆ¨л- ˙˚λ- ∆ƒ∂"
After NON_UNICODE transliteration: "..ø- ..l- ..A- ..."
Example
The following example shows how to send a transliterated message by adding one of the supported alphabets in the transliteration parameter.
POST /sms/1/text/advanced HTTP/1.1
Host: api.infobip.com
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Content-Type: application/json
{
"messages":[
{
"from":"InfoSMS",
"destinations":[
{
"to":"41793026727"
}
],
"text":"Ως Μεγαρικό ψήφισμα είναι γνωστή η απόφαση της Εκκλησίας του δήμου των Αθηναίων (πιθανόν γύρω στο 433/2 π.Χ.) να επιβάλει αυστηρό και καθολικό εμπάργκο στα",
"transliteration":"GREEK"
}
]
}
Text sent: Ως Μεγαρικό ψήφισμα είναι γνωστή η απόφαση της Εκκλησίας του δήμου των Αθηναίων (πιθανόν γύρω στο 433/2 π.Χ.) να επιβάλει αυστηρό και καθολικό εμπάργκο στα
Text received: ΩΣ MEΓAPIKO ΨHΦIΣMA EINAI ΓNΩΣTH H AΠOΦAΣH THΣ EKKΛHΣIAΣ TOY ΔHMOY TΩN AΘHNAIΩN (ΠIΘANON ΓYPΩ ΣTO 433/2 Π.X.) NA EΠIBAΛEI AYΣTHPO KAI KAΘOΛIKO EMΠAPΓKO ΣTA
By using transliteration, Greek lowercase letters that are not supported in the GSM alphabet are converted to uppercase letters that are supported.
The following image shows a list of allowed characters in the GSM alphabet for Greek.