Serializing PHP

Larry Garfield

@Crell@phpc.social

Cover of Exploring PHP 8.0 Cover of Thinking Functionally in PHP

What is serialization?

Cerealization

(Credit: https://flickr.com/photos/cottinghamphotography/6200250080/)

Plethora of Potential PHP Processes

serialize() / unserialize() for all values(*)

Special controls for objects:

  • __sleep/__wakeup (The before times)
  • Serializable (PHP 5.1, Deprecated in 8.1)
  • __serialize/__unserialize (PHP 7.4)
  • var_export()/__set_state()
  • JsonSerializable (__serialize() only, for json_encode())

Only use __serialize/__unserialize.

cf: Benchmarking Serialization

__serialize/__unserialize


          class User {
              protected int $id;
              protected string $name;
              protected DateTime $lastLogin;

              // ...

              public function __serialize(): array {
                  return ['id' => $this->id, 'name' => $this->name];
              }

              public function __unserialize(array $data): void {
                  $this->id = $data['id'];
                  $this->name = $data['name'];
                  $this->lastLogin = UserSystem::getLastLogin($this->id);
              }
          }
        

          $s = serialize(new User());
          print_r($s);

          $u = unserialize($s, ['allowed_classes' => [User::class]]);

          // O:4:"User":2:{s:2:"id";i:42;s:4:"name";s:5:"Larry";}
        

Separate Logic from Data

Logic
Data

Separate logic from data

  • If it's in your DI Container, thou shalt not serialize it
  • If it references something in the container, thou shalt not serialize
  • Value objects: +1
  • Entities: Only in data mapper
  • Just say No to Active Record
  • Applies to all serialization formats

But what about outside systems?

  • JSON, YAML, XML, TOML, etc.
  • PHP arrays

Gotta roll your own

Story time

TYPO3

Requirements

  • Mutate data on import
  • Dynamic type maps
  • Implode/explode arrays, sometimes
  • Fast

Don't solve a problem, build a tool to solve the problem, then use it.

So, Symfony Serializer?

  • Widely used
  • Very flexible
  • Couldn't collect/flatten
  • Only static type maps ("class discriminators")
  • Complex architecture, hard to modify

Sigh. Time to write one...

Roll your own

Stand on the shoulders of giants

I studied the Rust Serde crate

Three libraries

Simple hydration

EnvMapper

The core logic (1)


          class EnvMapper {
              public function map(string $class, bool $require = false, ?array $source = null): object {
                  $source ??= $_ENV;
                  $rClass = new \ReflectionClass($class);
                  $rProperties = $rClass->getProperties();

                  $toSet = [];
                  foreach ($rProperties as $rProp) {
                      $propName = $rProp->getName();
                      $envName = $this->normalizeName($propName);
                      if (isset($source[$envName])) {
                          $toSet[$propName]
                            = $this->typeNormalize($source[$envName], $rProp);
                      } elseif (PropValue::None !== $default = $this->getDefaultValue($rProp)) {
                          $toSet[$propName] = $default;
                      } elseif ($require) {
                          throw MissingEnvValue::create($propName, $class);
                      }
                  }

                  // ...
              }
           }
        

The core logic (2)


          class EnvMapper {
              public function map(string $class, bool $require = false, ?array $source = null): object {
                  // ...

                  $populator = function (array $props) {
                      foreach ($props as $k => $v) {
                          try {
                              $this->$k = $v;
                          } catch (\TypeError $e) {
                              throw TypeMismatch::create($this::class, $k, $v);
                          }
                      }
                  };

                  $env = $rClass->newInstanceWithoutConstructor();

                  // Read: invoke $popular on $env, passing in $toSet
                  $populator->call($env, $toSet);

                  return $env;
              }
           }
        

What does this show?

  • Deserialization is driven by class definition
  • Type info & defaults driven by class definition
  • Reflection is pretty fast, but not free

These will be important

Cool PHP trick #1

Visibility busting


           $reader = (fn (string $prop) => $this->$prop ?? null)->bindTo($obj, $obj);

          $value = $reader('privateProp');
        

          $populator = function (array $props) {
              foreach ($props as $k => $v) {
                  $this->$k = $v;
              }
          };

          $env = $rClass->newInstanceWithoutConstructor();

          $populator->call($obj, $toSet);
        

(Please don't do this)

Cool PHP trick #2

Enum as error code


          enum PropValue {
              case None;
          }

          // This actual code uses mixed, but usually union types are better.
          function getDefaultValue(\ReflectionProperty $subject): string|PropValue
          {
              $params = $this->getPropertiesForClass($subject->getDeclaringClass());

              $param = $params[$subject->getName()] ?? null;

              return $param?->isDefaultValueAvailable()
                  ? $param->getDefaultValue()
                  : PropValue::None;
          }
        

cf: Much Ado about Null

What does Serde do?

Serde


        use Crell\Serde\SerdeCommon;

        $serde = new SerdeCommon();

        $object = new SomeClass('a', 'b', new OtherClass());

        $json = $serde->serialize($object, format: 'json');

        $obj = $serde->deserialize($json, from: 'json', to: SomeClass::class);
        
  • JSON, YAML, array, CSV, streaming JSON, streaming CSV
  • Customize per-object-type
  • Support any additional formats
  • Will use __serialize()/__unserialize() if defined

Basic usage


          class Person
          {
              #[Field(serializedName: 'callme')]
              public string $firstName = 'Larry';

              #[Field(renameWith: Cases::CamelCase)]
              public string $lastName = 'Garfield';

              public string $job = 'Presenter';

              #[Field(alias: ['company'])]
              public string $employer = 'LegalZoom';

              #[Field(exclude: true)]
              public string $password = 'youwish';
          }
        

          {
            "callme": "Larry",
            "LastName": "Garfield",
            "job": "Presenter",
            "employer": "LegalZoom"
          }
        

Cool PHP trick #3

Enums as default objects


          interface RenamingStrategy {
              public function convert(string $name): string;
          }

          enum Cases implements RenamingStrategy {
              case UPPERCASE;
              case lowercase;
              case snake_case;
              case kebab_case;
              case CamelCase;
              case lowerCamelCase;

              public function convert(string $name): string {
                  return match ($this) {
                      self::UPPERCASE => strtoupper($name),
                      self::lowercase => strtolower($name),
                      self::snake_case =>  // ...,
                      self::kebab_case => // ...,
                      self::CamelCase => // ...,
                      self::lowerCamelCase => // ...,
                  };
              }
        

Default handling (deserialization)


          class Person
          {
              #[Field(default: 'Hidden')]
              public string $location;

              #[Field[(useDefault: false)]
              public int $age;

              #[Field(requireValue: true)]
              public string $job;

              public function __construct(
                  public string $name = 'Anonymous',
              ) {}
          }
        
  • location -> "Hidden"
  • name -> "Anonymous"
  • age -> uninitialized
  • job -> Exception

Sequences vs Dictionaries


          class Order {
              public string $orderId;

              public int $userId;

              #[Field(serializedName: 'items')]
              #[SequenceField(arrayType: Product::class)]
              public array $products;

              #[DictionaryField(arrayType: Tag::class, keyType: KeyType::String)]
              public array $tags;
          }
        

          {
              "orderId": "abc123",
              "userId": 5,
              "items": [
                  { "name": "Widget", "price": 9.99 },
                  { "name": "Gadget", "price": 4.99 }
              ],
              "tags": {
                "userClass": {"name": "VIP"},
                "discount": {"name": "Closeout"}
              }
          }
        

Implosion


          class Order {
              #[SequenceField(implodeOn: ',')]
              protected array $productIds = [5, 6, 7];

              #[DictionaryField(implodeOn: ',', joinOn: '=')]
              protected array $dimensions = [
                  'height' => 40,
                  'width' => 20,
              ];
          }
        

          {
              "productIds": "5,6,7",
              "dimensions": "height=40,width=20"
          }
        

Flatten/collect


          class Results {
            public function __construct(
              #[Serde\Field(flatten: true)]
              public Pagination $pagination,
              #[Serde\SequenceField(arrayType: Product::class)]
              public array $products,
            ) {}
          }

          class Pagination {
            public function __construct(public int $total, public int $offset, public int $limit) {}
          }

          class Product {
            public function __construct(public string $name, public float $price) {}
          }
        

          {
              "total": 100,
              "offset": 20,
              "limit": 10,
              "products": [
                  { "name": "Widget", "price": 9.99 },
                  { "name": "Gadget", "price": 4.99 }
              ]
          }
        

Advanced flattening


        class DetailedResults {
            public function __construct(
                #[Serde\Field(flatten: true)]
                public NestedPagination $pagination,
                #[Serde\Field(flatten: true)]
                public ProductType $type,
                #[Serde\SequenceField(arrayType: Product::class)]
                public array $products,
                #[Serde\Field(flatten: true)]
                public array $other = [],
            ) {}
        }
        class NestedPagination {
            public function __construct(
                public int $total,
                public int $limit,
                #[Serde\Field(flatten: true)]
                public PaginationState $state,
            ) {}
        }
        

        class PaginationState {
            public function __construct(public int $offset) {}
        }
        class ProductType {
            public function __construct(public string $name = '', public string $category = '') {}
        }
        

Advanced flattening


          {
            "total": 100,
            "limit": 10,
            "offset": 20,
            "name": "Dodads",
            "category": "Small items",
            "products": [
                {
                    "name": "Widget",
                    "price": 9.99
                },
                {
                    "name": "Gadget",
                    "price": 4.99
                }
            ],
            "foo": "beep",
            "bar": "boop"
        }
        

Type Maps


          interface Product {}

          interface Book extends Product {}

          class PaperBook implements Book {
              protected string $title;
              protected int $pages;
          }

          class DigitalBook implements Book {
              protected string $title;
              protected int $bytes;
          }

          class Sale {
              protected Book $book;
              protected float $discountRate;
          }

          class Order {
              protected string $orderId;

              #[SequenceField(arrayType: Book::class)]
              protected array $products;
          }
        

Type Maps


          class Sale {
              #[ClassNameTypeMap(key: 'type')]
              protected Book $book;

              protected float $discountRate;
          }
        

          {
              "book": {
                  "type": "Your\\App\\DigitalBook",
                  "title": "Thinking Functionally in PHP",
                  "bytes": 45000
              },
              "discountRate": 0.2
          }
        

Type Maps


          class Sale {
              #[StaticTypeMap(key: 'type', map: [
                  'paper' => Book::class,
                  'ebook' => DigitalBook::class,
              ])]
              protected Book $book;

              protected float $discountRate;
          }
        

          #[StaticTypeMap(key: 'type', map: [
              'paper' => Book::class,
              'ebook' => DigitalBook::class,
          ])]
          interface Book {}
        

          {
              "book": {
                  "type": "ebook",
                  "title": "Thinking Functionally in PHP",
                  "bytes": 45000
              },
              "discountRate": 0.2
          }
        

Dynamic Type Maps


          class ProductTypeMap implements TypeMap {
              public function __construct(protected readonly Connection $db) {}

              public function keyField(): string {
                  return 'type';
              }

              public function findClass(string $id): ?string {
                  return $this->db->someLookup($id);
              }

              public function findIdentifier(string $class): ?string {
                  return $this->db->someMappingLogic($class);
              }
          }

          $typeMap = new ProductTypeMap($dbConnection);

          $serde = new SerdeCommon(typeMaps: [
              Your\App\Product::class => $typeMap,
          ]);

          $json = $serde->serialize($aBook, to: 'json');
        

Streaming

  • Can stream to JSON or CSV
  • \Traversable objects treated as any other object
  • iterable will get "run out" when serializing
  • Result: Lazy create and lazy stream at once!

Streaming


          // The CsvStreamFormatter is not included by default.
          $s = new SerdeCommon(formatters: [new CsvStreamFormatter()]);

          // You may use any PHP supported stream here, including files,
          // network sockets, stdout, an in-memory temp stream, etc.
          $init = new FormatterStream(fopen('/tmp/output.json', 'wb'));

          $result = $serde->serialize($data, format: 'csv-stream', init: $init);

          $fp = $result->stream;
          // Now do with $fp as you wish.
        

Streaming


          class ProductList {
              public function __construct(
                  #[SequenceField(arrayType: Product::class)]
                  private iterable $products,
              ) {}
          }
          class Product { /* */ }
          $db = ...;

          $callback = function() use ($db) {
              $result = $db->query("SELECT name, color, price FROM products ORDER BY name");

              foreach ($result as $record) {
                  $sales = $db->query("SELECT start, end FROM sales WHERE product=?", $record['id'])->fetchAll();
                  yield new Product($record, $sales);
              }
          };
        

          // This is a lazy list of products, which will be pulled from the database.
          $products = new ProductList($callback());

          $s = new SerdeCommon(formatters: [new JsonStreamFormatter()]);

          // Write to stdout, aka, back to the browser.
          $init = new FormatterStream(fopen('php://output', 'wb'));
          $result = $serde->serialize($products, format: 'json-stream', init: $init);
        

Scopes


          class User {
              private string $username;

              #[Field(exclude: true)]
              private string $password;

              #[Field(exclude: true)]
              #[Field(scope: 'admin')]
              private string $role;
          }
        

          $json = $serde->serialize($user, 'json');
          /*
              { "username": "Larry" }
          */
        

          $json = $serde->serialize($user, 'json', scopes: ['admin']);
          /*
              { "username": "Larry", "role": "Developer" }
          */
        

Versioning with scopes


          #[ClassSettings(includeFieldsByDefault: false)]
          class Product {
              #[Field]
              private int $id = 5;

              private int $stock = 50;

              #[Field, Field(scopes: ['legacy'], serializedName: 'label')]
              private string $name = 'Fancy widget';

              #[Field(scopes: ['newsystem'])]
              private string $price = '9.99';

              #[Field(scopes: ['legacy'], serializedName: 'cost')]
              private float $legacyPrice = 9.99;
          }
        

          // No scope
          { "id": 5, "name": "Fancy widget" }
        

          // "legacy" scope
          { "id": 5, "label": "Fancy widget", "cost": 9.99, }
        

          // "newsystem" scope
          { "id": 5, "label": "Fancy widget", "price": "9.99" }
        

Inside Serde

Architecture

  • Stream-based (no IR)
  • Can build IR per-format
  • Almost entirely Attribute-driven
  • Importers / Exporters
  • Deformatters / Formatters
  • Recursive all the way down

Serialization overview

            sequenceDiagram
            participant Serde
            participant Serializer
            participant Exporter
            participant Formatter
            Serde->>Formatter: initialize()
            Formatter-->>Serde: prepared value
            Serde->>Serializer: Set up
            Serde->>Serializer: serialize()
            activate Serializer
            loop For each property
              Serializer->>Exporter: call depending on type
              Exporter->>Formatter: type-specific write method
              Formatter->>Serializer: serialize() sub-value
            end
            Serializer->>Formatter: finalize()
            Serializer-->>Serde: final value
            deactivate Serializer
          

Deserialization overview

            sequenceDiagram
            participant Serde
            participant Deserializer
            participant Importer
            participant Deformatter
            Serde->>Deformatter: initialize()
            Deformatter-->>Serde: prepared source
            Serde->>Deserializer: Set up
            Serde->>Deserializer: deserialize()
            activate Deserializer
            loop For each property
              Deserializer->>Importer: call depending on type
              Importer->>Deformatter: type-specific read method
              Deformatter->>Deserializer: deserialize() sub-value
            end
            Deserializer->>Deformatter: finalize()
            Deserializer-->>Serde: final value
            deactivate Deserializer
          

Cool PHP trick #4

Internal execution objects


          class ThingDoer {
              public function __construct(private DepA $depA, private DepB $debB) {}

              public function run(A $paramA, B $paramB) {
                  $runner = new Runner($this->depA, $this->depB, $paramA, $paramB);
                  return $runner->run();
              }
          }

          class ThingRunner {
              public function __construct(
                public readonly DepA $depA, public readonly DepB $debB, private,
                public readonly A $a, public readonly B $b) {}

            public function run(): Result {
              // Call a dozen internal methods that all use the constructor args.
            }
          }
        
  • Avoid passing values in each method every time
  • Fully immutable, so public properties OK!
  • Safe to pass $this around dependencies
  • No need to DI every separate piece separately!

Setup


          class SerdeCommon extends Serde {
              protected readonly array $exporters;
              protected readonly array $importers;
              protected readonly array $formatters;
              protected readonly array $deformatters;
              protected readonly TypeMapper $typeMapper;

              public function __construct(
                  protected readonly ClassAnalyzer $analyzer
                    = new MemoryCacheAnalyzer(new Analyzer()),
                  array $handlers = [],
                  array $formatters = [],
                  array $typeMaps = [],
              ) { ... }
          }
        
  • Everything is readonly
  • SerdeCommon is just setup; write your own!
  • One of only 2 non-exception class extends in the entire system
  • Default dependencies

Cool PHP trick #5

Default dependencies


          class MyService
          {
              public function __construct(private Other $other = new Other()) {}
          }
        
  • Useful when dependency is mostly a util without its own dependencies
  • Makes testing a lot easier
  • Makes one-off uses easier
  • For production, still DI to avoid duplication
  • Default NullLogger?

Full circle

TYPO3 decided they liked global arrays

Crell/Config

Overview


          # config/common/editorsettings.yaml
          color: "#ccddee"
          bgcolor: "#ffffff"
        

          # config/dev/editorsettings.yaml
          bgcolor: '#eeff00'
        

          class EditorSettings {
              public function __construct(
                  public readonly string $color,
                  public readonly string $bgcolor,
                  public readonly int $fontSize = 14,
              ) {}
          }

          $loader = new LayeredLoader([
            new YamlFileSource('./config/common'),
            new YamlFileSource('./config/' . APP_ENV),
          ]);

          $cachedLoader = new SerializedFilesytemCache($loader, '/cache/path');
          $editorConfig = $cachedLoader->load(EditorSettings::class);
        

Advanced usage


          use Crell\Config\Config;

          #[Config('dashboard')]
          readonly class DashboardSettings {
              public function __construct(
                  public string $name,
                  #[Field(flatten: true)]
                  #[DictionaryField(arrayType: Component::class, keyType: KeyType::String)]
                  #[StaticTypeMap(key: 'type', map: [
                      'latest_posts' => LatestPosts::class,
                      'user_status' => UserStatus::class,
                      'pending' => PostsNeedModeration::class,
                  ])]
                  public array $components = [],
              ) {}
          }
        

Now looks for dashboard.[yaml|json|php|ini]

Default is str_replace($class, '\', '_')

Advanced usage


          readonly class LatestPosts implements Component {
              public function __construct(
                  public string $category,
                  public Side $side = Side::Left,
              ) {}
          }

          readonly class PostsNeedModeration implements Component {
              public function __construct(
                  public int $count = 5,
                  public Side $side = Side::Left,
              ) {}
          }

          readonly class UserStatus implements Component
          {
              public function __construct(
                  public string $user,
                  public Side $side = Side::Left,
              ) {}
          }
        

          enum Side: string {
              case Left = 'left';
              case Right = 'right';
          }
        

The config


          # config/common/dashboard.yaml
          name: "User dashboard"
          me:
              type: 'user_status'
          movie_talk:
              type: 'latest_posts'
              category: movies
          music_talk:
              type: 'latest_posts'
              category: music
              side: right
        

          # config/admin/dashboard.yaml
          name: "Admin dashboard"
          mod_todo:
              type: 'pending'
              side: right
        

          $loaders = [
            new YamlFileSource('./config/common'),
            new YamlFileSource('./config/' . APP_ENV),
          ];
          if (user_is_admin()) $loaders[] = new YamlFileSource('./config/admin');
          $loader = new LayeredLoader($loaders);

          $dashConfig = $loader->load(DashboardSettings::class);
        

Use it


        class Dashboard {
            public function __construct(private DashboardSettings $settings) {}

            public function renderDashboard(): string {
                // Do stuff here.
                $this->settings->name;
                foreach ($this->settings->components as $c) { ... }
            }
        }
        

Test it


            class DashboardTest extends TestCase {
                public function test_something(): void {
                    $settings = new DashboardSettings('Test', [new UserStatus('crell')]);
                    $subject = new Dashboard($settings);
                    // Make various assertions.
                }
            }
          

Dependency Inject it


            $container->register(DashboardSettings::class, fn(Container $c)
              => $c->get(ConfigLoader::class)->load(DashboardSettings::class);
          

Conclusions

  • var_export() for code generating arrays only
  • serialize()/unserialize() for internal use
  • Serde for anything external
  • Only ever serialize value objects / clean entities
  • Attributes are awesome
  • PHP is the schema
  • Holy crap modern PHP is nice!

Resources

Larry Garfield

@Crell@phpc.social

All about PHP 8!

https://bit.ly/php80

Cover of Exploring PHP 8.0

Buy my book!

https://bit.ly/fn-php

Cover of Thinking Functionally in PHP

https://www.garfieldtech.com/