Symfony String Component - emulate Javascript behaviours in PHP
I've recently encountered an issue with strings in the context of a middleware app in PHP between a frontend app and an external service. Both of them, validating string length in Javascript.
It started the easy way but emojis were not ok to simplify my life. I've found out the Symfony String Component would solve my issue.
What is the issue with emojis
Emojis may vary in code point length as they are not always represented by a single character. For example, the π "thumbs up" emoji is represented by the character sequence U+1F44D, which is 2 code points long. Another example is the π¨βπ©βπ§βπ¦ "family" emoji, which is represented by the character sequence U+1F46A U+200D U+1F469 U+200D U+1F467 U+200D U+1F466, which is 8 code points long.
The problem is that Javascript and PHP have different ways to count the length of a string.
Javascript counts the number of code points in a string, while PHP counts the number of bytes in a string.
Why JS and PHP are not doing it the same way
Most JavaScript engines use UTF-16 internally, but they expose string APIs that work with UCS-2. UCS-2, which is a character encoding that can represent all the characters in the Basic Multilingual Plane (BMP) of Unicode, but not the supplementary characters (those with code points above U+FFFF).
In PHP, the string length is calculated by counting the number of bytes in the string, which is the number of characters in the string multiplied by the number of bytes per character.
As a result, the length of a string in Javascript and PHP may be different.
Some examples might help to understand the issue:
String | Javascript length | PHP str_len | PHP mb_strlen |
---|---|---|---|
a | 1 | 1 | 1 |
π | 2 | 4 | 1 |
π€‘ | 2 | 4 | 1 |
π¨βπ©βπ§βπ¦ | 11 | 25 | 7 |
π©βπ» | 5 | 11 | 3 |
PHP examples generated with: https://3v4l.org/C3NOp
<?php
$array = ['a', 'π', 'π€‘', 'π¨βπ©βπ§βπ¦', 'π©βπ»'];
foreach ($array as $string) {
echo $string;
echo ' - ';
echo strlen($string);
echo ' - ';
echo mb_strlen($string);
echo ' - ';
$utf16String = mb_convert_encoding($string, 'UTF-16', 'UTF-8');
echo strlen($utf16String) / 2;
echo PHP_EOL;
}
If you want to understand how calculating string length in PHP is not that easy, here is a well explained blog post to understand The Hell of Calculating The Size of Strings in PHP.
An implementation of a JSString ValueObject in PHP with Symfony String Component
I wanted to create a ValueObject that would behave like a JavaScript string and be able to truncate it to a given length and add a suffix. I though it would be a good idea to use the Symfony String Component to do so.
The Symfony String Component is a PHP library that provides a set of classes to manipulate strings.
Installation
composer require symfony/string
Read the official Symfony documentation for more details about the String Component.
Implementation
By default:
- the
UnicodeString
class counts the number of Grapheme in a string, not the number of characters. - the
CodePointString
usesmb_strlen
withUTF-8
internally - the
ByteString
usesstrlen
internally
None of them are using the same logic as Javascript.
But the UnicodeString
seems to be helpful to truncate strings without cutting in half surrogate pairs.
I will build my implementation on top of it.
I've based it on what I've found in this Gist from msjyoo created in 2017.
Thanks to his work, we can now mimic JS behaviour in PHP:
<?php
namespace App\ValueObject;
use Symfony\Component\String\UnicodeString;
class JSString extends UnicodeString
{
public function length(): int
{
/** @var string $utf16String */
$utf16String = mb_convert_encoding($this->string, 'UTF-16', 'UTF-8');
return (int) mb_strlen($utf16String, "UCS-2");
}
public function truncate(int $length, string $ellipsis = '', bool $cut = true): static
{
$unicodeString = new UnicodeString($this->string);
$truncateLength = ($unicodeString->length() * $length) / $this->length();
$newSelf = new self($unicodeString->truncate((int) $truncateLength));
if ($newSelf->append($ellipsis)->length() <= $length) {
return $newSelf->append($ellipsis);
}
return $newSelf;
}
}
Or using encapsulation:
<?php
namespace App\ValueObject;
use Symfony\Component\String\UnicodeString;
class JSString
{
private UnicodeString $unicodeString;
public function __construct(string $string)
{
$this->unicodeString = new UnicodeString($string);
}
public function length(): int
{
/** @var string $utf16String */
$utf16String = mb_convert_encoding($this->unicodeString->toString(), 'UTF-16', 'UTF-8');
return (int) mb_strlen($utf16String, "UCS-2");
}
public function truncate(int $length, string $ellipsis = '', bool $cut = true): self
{
$truncateLength = ($this->unicodeString->length() * $length) / $this->length();
$truncatedUnicodeString = $this->unicodeString->truncate((int) $truncateLength);
if (($truncatedUnicodeString->append($ellipsis)->length() * 2) <= $length) {
$truncatedUnicodeString = $truncatedUnicodeString->append($ellipsis);
}
return new self($truncatedUnicodeString->toString());
}
public function toString(): string
{
return $this->unicodeString->toString();
}
}
And the following tests to validate the implementation:
<?php
declare(strict_types=1);
namespace App\Tests\Domain\ValueObject;
use App\ValueObject\JSString;
use PHPUnit\Framework\TestCase;
class JSStringTest extends TestCase
{
/**
* @dataProvider provideJSLengthExamples
*/
public function test_length(string $parameter, int $expected): void
{
$string = new JSString($parameter);
self::assertSame($expected, $string->length());
}
/**
* @dataProvider provideTruncateJSExamples
*/
public function test_truncate(string $parameter, int $truncateLength, string $ellipsis, $expected, ?int $expectedLength = null): void
{
$string = new JSString($parameter);
$truncated = $string->truncate($truncateLength, $ellipsis);
self::assertSame($expected, $truncated->toString());
self::assertSame($expectedLength ?? $truncateLength, $truncated->length());
}
public function provideJSLengthExamples(): iterable
{
yield ['a', 1];
yield ['π©βπ»', 5];
yield ['π', 2];
yield ['π¨βπ©βπ§βπ¦', 11];
yield ['π€‘', 2];
yield ['π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦', 110];
}
public function provideTruncateJSExamples(): iterable
{
yield ['abc', 1, '', 'a'];
yield ['π©βπ»', 3, '', '', 0];
yield ['π©βπ»', 5, '', 'π©βπ»', 5];
yield ['π©βπ»', 8, '...', 'π©βπ»...', 8];
yield ['π©βπ»π©βπ»', 9, '', 'π©βπ»', 5];
yield ['π©βπ»π©βπ»', 9, '...', 'π©βπ»...', 8];
yield ['ππ', 2, '', 'π', 2];
yield ['π¨βπ©βπ§βπ¦', 4, '', '', 0];
yield ['π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦', 10, '', '', 0];
yield ['π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦', 18, '...', 'π¨βπ©βπ§βπ¦...', 14];
yield ['π€‘π€‘π€‘π€‘π€‘', 3, '.', 'π€‘.', 3];
yield [
'π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»',
255,
'',
'π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»'
];
yield [
'12π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»',
255,
'...',
'12π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»π©βπ»...'
];
yield [
'π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦',
100,
'',
'π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦π¨βπ©βπ§βπ¦',
99
];
}
}
I hope this article will help you to understand the issue and the solution to be compliant with Javascript behaviour in PHP.
Bonus point is that I can truncate strings, but keep the emojis intact. It is a nice feature to have to keep the meaning of what the user has written.
What if...
...You might want to do it the other way around!
I found this StackOverflow thread on "How to count the correct length of a string with Emojis in JS" where you can choose resources to work with strings in Javascript and counting Graphemes correctly.