Never* Use Arrays

Larry Garfield

@Crell

Larry implements Huggable

array(PHP)

They come with an array of problems.

The jokes will be that bad.

They come with an array of problems.

  • Inconsistent behavior
  • No type safety
  • No error handling
  • Poor performance
  • ...PHP doesn't have them

What is in array?

  • ordered sequence
  • of values
  • of the same type
  • identified by their offset

PHP has associative arrays

  • Arbitrary key-value index
  • "Dictionary"/"Map"
  • Numeric arrays via a hack
  • No type guarantees (on value or key)

Arrays are a hack


          $a[] = 'A';
          $a[] = 'B';
          print_r($a);
        

          Array
          (
              [0] => A
              [1] => B
          )
        

          unset($a[1]);
          $a[] = 'C';
          print_r($a);
        

          Array
          (
              [0] => A
              [2] => C
          )
        

Arrays are a buggy hack


          for ($i = 0; $i < count($a); ++$i) {
            print $a[$i];
          }
        

Notice: Undefined offset: 1 in /in/fQ0UB on line 13

What's wrong with this picture?


          protected function expandArguments(&$query, &$args) {
              foreach ($args as $key => $data) {
                // Handle expansion of arrays.
                $key_name = str_replace('[]', '__', $key);
                $new_keys = [];
                foreach ($data as $i => $value) {
                  $new_keys[$key_name . $i] = $value;
                }

                // Update the query with the new placeholders.
                $query = str_replace($key, implode(', ', array_keys($new_keys)), $query);

                // Update the args array with the new placeholders.
                unset($args[$key]);
                $args += $new_keys;
              }
          }
        

Hey, look, an SQL injection!


          protected function expandArguments(&$query, &$args) {
              foreach ($args as $key => $data) {
                // Handle expansion of arrays.
                $key_name = str_replace('[]', '__', $key);
                $new_keys = [];
                foreach ($data as $i => $value) {
                  $new_keys[$key_name . $i] = $value;
                }

                // Update the query with the new placeholders.
                $query = str_replace($key, implode(', ', array_keys($new_keys)), $query);

                // Update the args array with the new placeholders.
                unset($args[$key]);
                $args += $new_keys;
              }
          }
        

Drupageddon, 2014

Drupal

Stupid simple fix


          protected function expandArguments(&$query, &$args) {
              foreach ($args as $key => $data) {
                // Handle expansion of arrays.
                $key_name = str_replace('[]', '__', $key);
                $new_keys = [];
                foreach (array_values($data) as $i => $value) {
                  $new_keys[$key_name . $i] = $value;
                }

                // Update the query with the new placeholders.
                $query = str_replace($key, implode(', ', array_keys($new_keys)), $query);

                // Update the args array with the new placeholders.
                unset($args[$key]);
                $args += $new_keys;
              }
          }
        

PHP uses arrays in place of purpose-built data structures.

That is almost never the right answer.

So what's better?

Purpose-built data structures.

Lists/Sequences

\ArrayObject

class TypedArray extends \ArrayObject

A type-specific "array"


          class TypedArray extends \ArrayObject {
              protected $type;

              public static function forType(string $type): self {
                  $ret = new static();
                  $ret->type = $type;
                  return $ret;
              }

              protected function __construct(...$args) {
                  parent::__construct(...$args);
              }

              public function offsetSet($index, $newval) {
                  if (! $newval instanceof $this->type) {
                      throw new \TypeError(
                        sprintf('Only values of type %s are supported', $this->type)
                      );
                  }
                  parent::offsetSet($index, $newval);
              }
          }
        

A type-specific "array"


          $a = TypedArray::forType(Point::class);

          $a[0] = new Point(2, 4);
          $a[1] = new Point(6, 9);
          $a['foobar'] = new Point(5, 2);

          foreach ($a as $point) { ... }
        

          // Throws \TypeError.
          $a['bad'] = new Carrot();
          $a['also_bad'] = 'a point';
        

Force a sequence through the API


          class TypedSequence implements \IteratorAggregate, \Countable {
              protected string $type;
              protected array $values = [];

              public static function forType(string $type): self {...}
              protected function __construct() {}

              public function getIterator() {
                  return new ArrayIterator($this->values);
              }

              public function count(): int {
                  return count($this->values);
              }

              public function add($newval): self {
                  if (! $newval instanceof $this->type) {
                      throw new \TypeError(...);
                  }
                  $this->values[] = $newval;
                  return $this;
              }
          }
        

But arrays pass by value, objects by reference!

No, objects pass by value. You've been lied to.

But they're modifiable!

Their handle passes by value, not the object.

OMGWTFBBQ?

Sigh.

Can't touch $this


          class ImmutableTypedSequence implements IteratorAggregate, Countable {
              // Same as TypedSequence, but...

              public function add($newval): self {
                  if (!$newval instanceof $this->type) { throw new \TypeError(...);}

                  $new = clone($this);
                  $values = [...$this->values, $newval];
                  $new->values = $values;
                  return $new;
              }

              public function remove($val): self {
                  if ($key = array_search($val, $this->values, true)) {
                      $values = $this->values;
                      unset($values[$key]);
                      $new = clone($this);
                      $new->values = array_values($values);
                      return $new;
                  }
                  return $this;
              }
          }
        

Type-specific


          class PointSequence implements IteratorAggregate, Countable {
              // Same as before, but...

              public function add(Point $newval): self {
                  $new = clone($this);
                  $values = [...$this->values, $newval];
                  $new->values = $values;
                  return $new;
              }

              public function remove(Point $val): self {
                  if ($key = array_search($val, $this->values, true)) {
                      $values = $this->values;
                      unset($values[$key]);
                      $new = clone($this);
                      $new->values = array_values($values);
                      return $new;
                  }
                  return $this;
              }
          }
        

Uniqueness ("Set")


          class Set extends ImmutableTypedSequence {

              public function has($val): bool {
                  return array_search($val, $this->values, true) !== false;
              }

              public function add($newval): self {
                  return $this->has($newval) ? $this : parent::add($newval);
              }
          }
        

Ordering


          class OrderedSet extends Set {
              protected $compare;

              public static function forType(string $type, callable $compare = null): self {
                  $ret = new static();
                  $ret->type = $type;
                  $ret->compare = $compare;
                  return $ret;
              }

              public function getIterator() {
                  if ($this->compare) {
                      usort($this->values, $this->compare);
                  }
                  return new ArrayIterator($this->values);
              }
          }
        

          $compare = fn(Point $a, Point $b): int => [$a->x, $a->y] <=> [$b->x, $b->y];

          $s = OrderedSet::forType(Point::class, $compare)
              ->add(new Point(3, 4))
              ->add(new Point(2, 5))
              ->add(new Point(2, 3));

          foreach ($s as $point) {
              var_dump($point);
          }
        

          Point Object
          (
              [x] => 2
              [y] => 3
          )
          Point Object
          (
              [x] => 2
              [y] => 5
          )
          Point Object
          (
              [x] => 3
              [y] => 4
          )
        

Purpose-built ordering


          class PointSet implements IteratorAggregate {
              protected array $values = [];

              public function has(Point $val): bool {
                  return array_search($val, $this->values, true) !== false;
              }

              public function add(Point $newval): self {
                  $new = clone($this);
                  $new->values = [...$values, $newval];
                  return $new;
              }

              public function remove(Point $oldval): self { ... }

              public function compare(Point $, Point $b): int {
                  return [$a->x, $a->y] <=> [$b->x, $b->y];
              }

              public function getIterator() {
                  usort($this->values, [$this, 'compare']);
                  return new ArrayIterator($this->values);
              }
          }
        

What have we gained?

  • Type-safe lists
  • Sequence-guaranteed lists
  • Immutability guarantee
  • Self-documenting API
  • Encapsulation of functionality

Passable lists

Can't type hint on array anymore

:-(

Do you care about the type of the objects?

ProductList implements \Traversable

Do you just want to be able to foreach()

iterable

iterable :: array|\Traversable

\Traversable :: \Iterator|\IteratorAggregate

\Generator implements \Iterator

Generators

  • Lazy stream of values
  • Lazy value creation
  • Structure flattening
  • Simpler code

A PSR-14 example


        interface ListenerProviderInterface {
            /**
             * @param object $event
             *   An event for which to return the relevant listeners.
             * @return iterable[callable]
             *   An iterable (array, iterator, or generator) of callables.  Each
             *   callable MUST be type-compatible with $event.
             */
            public function getListenersForEvent(object $event): iterable;
        }
        

A PSR-14 example


          class Dispatcher implements DispatcherInterface {
            // ...

            public function dispatch(object $event) {

              foreach ($this->provider->getListenersForEvent($event) as $listener) {
                if ($event instanceof StoppableEventInterface
                    && $event->isPropagationStopped()) {
                  break;
                }
                $listener($event);
              }

              return $event;
            }
          }
        

Tukio (PSR-14 reference implementation)


          class CallbackProvider implements ListenerProviderInterface {
              // ...

              public function getListenersForEvent(object $event): iterable {
                  if (!$event instanceof CallbackEventInterface) {
                      return [];
                  }
                  $subject = $event->getSubject();

                  foreach ($this->callbacks as $type => $callbacks) {
                      if ($event instanceof $type) {
                          foreach ($callbacks as $callback) {
                              if (method_exists($subject, $callback)) {
                                  yield [$subject, $callback];
                              }
                          }
                      }
                  }
              }
          }
        

Trivial to concatenate


          class AggregateProvider implements ListenerProviderInterface {
              /** @var array */
              protected $providers = [];

              public function getListenersForEvent(object $event): iterable {
                  /** @var ListenerProviderInterface $provider */
                  foreach ($this->providers as $provider) {
                      yield from $provider->getListenersForEvent($event);
                  }
              }
              // ...
          }
        

From fig/event-dispatcher-util

But what about array_* functions?

What about them?

array_* functions

  • Rarely used
  • Internal implementation detail
  • Easy to reimplement

The most common


          function iterable_map(iterable $list, callable $operation): iterable {
            foreach ($list as $k => $v) {
              yield $operation($k, $v);
            }
          }

          function iterable_filter(iterable $list, callable $filter): iterable {
            foreach ($list as $k => $v) {
              if ($filter($v)) {
                yield $k => $v;
              }
            }
          }
        

Collection objects


          class Collection implements \IteratorAggregate {
              protected $valuesGenerator;

              protected function __construct(){}

              public static function fromGenerator(callable $callback): self {
                  $new = new static();
                  $new->valuesGenerator = $callback;
                  return $new;
              }

              public static function fromIterable(iterable $values = null) {
                  $values ??= [];
                  return static::fromGenerator(function () use ($values) {
                      yield from $values;
                  });
              }

              public function getIterator(): iterable {
                  return ($this->valuesGenerator)();
              }
          }
        

Collection objects


          class Collection implements \IteratorAggregate {
              // ...
              public function append(iterable ...$collections): self {
                  return static::fromGenerator(function() use ($collections) {
                      yield from ($this->valuesGenerator)();
                      foreach ($collections as $col) {
                          yield from $col;
                      }
                  });
              }

              public function add(...$items): self {
                  return $this->append($items);
              }
          }
        

Collection objects


          class Collection implements \IteratorAggregate {
              // ...
              public function map(callable $fn): self {
                  return static::fromGenerator(function () use ($fn) {
                      foreach (($this->valuesGenerator)() as $key => $val) {
                          yield $key => $fn($val);
                      }
                  });
              }

              public function toArray(): array {
                  return iterator_to_array((function() {
                      foreach ($this as $value) {
                          yield $value;
                      }
                  })());
              }
        

Collection objects


          $c = Collection::fromIterable([1, 2, 3]);

          $c = $c->add(4, 5, 6);
          $c = $c->map(fn($v) => $v * 10);

          print_r($c->toArray());
        

          Array
          (
              [0] => 10
              [1] => 20
              [2] => 30
              [3] => 40
              [4] => 50
              [5] => 60
          )
        

Can you add map/filter to OrderedSet?

But what about associative arrays?

Lookup tables

  • Map from arbitrary value to arbitrary value
  • Keys are int or string only
  • Good for user/config-supplied keys

A common pattern


          $builders = [];
          foreach (getRegisteredBuilderPlugins() as $builder) {
              $builders[$builder->format()] = $builder;
          }
          // ...

          $output = $lookup[$request->getHeaderLine('Accept')]->buildOutput($stuff);
        
  • Error handling?
  • Missing keys?
  • Type safety?

Build your own lookup utility


          class Lookup implements ArrayAccess {
              protected $type;
              protected $values;
              protected $default;

              public static function forType(string $type, $default = null): self {
                  // Like we saw before, but also save $default.
              }

              public function offsetSet($offset, $value) {
                  if (! is_string($offset)) { throw new \TypeError(...); }
                  if (! $value instanceof $this->type) { throw new \TypeError(...);}

                  $this->values[$offset] = $value;
              }

              public function offsetGet($offset) {
                  return $this->values[$offset] ?? $this->default;
              }

              public function offsetExists($offset) { ... }
              public function offsetUnset($offset) { ... }
          }
        

Error handling is internalized


          $builders = Lookup::forType(OutputBuilder::class, new DefaultBuilder());

          foreach (getRegisteredBuilderPlugins() as $builder) {
              $builders[$builder->format()] = $builder;
          }
          // ...

          $output = $lookup[$request->getHeaderLine('Accept')]->buildOutput($stuff);
        

Or without ArrayAccess


          class Lookup {
              protected $type;
              protected $values;

              public static function forType(string $type): self {
                  // As before.
              }

              public function add(string $key, $value) {
                  if (! $value instanceof $this->type) { throw new \TypeError(...);}

                  $this->values[$key] = $value;
              }

              public function lookup(string $key, $default = null) {
                  return $this->values[$offset] ?? $default;
              }
          }
        

Or a purpose-built service


          class BuilderMap {
              protected $defaultFmt;
              protected $provider;
              protected $builders = [];

              public function __construct(BuilderProvider $p, $defaultFmt = '') {...}

              protected function buildLookup(): void {
                  foreach ($this->provider->getRegisteredBuilderPlugins() as $b) {
                      $this->builders[$b->format()] = $b;
                  }
              }

              public function lookup(string $key): OutputBuilder {
                  if (!$this->builders) $this->buildLookup();
                  return $this->values[$key] ?? $this->values[$this->defaultFmt];
              }
          }
        

          $builder = $builders->lookup($req->getHeaderLine('Accept'));
          $output = $builder->buildOutput($stuff);
        

Anonymous structs

Anonymous structs


          $order = [
              'id' => 345,
              'total' => 100.45,
              'skus' => [ 123, '5B3', '987'],
              'canceled' => false,
              'usr' => 8,
          ];
        
  • Is cancelled spelled right?
  • Is usr correct or a typo?
  • Is user an ID or User object?
  • Are skus numeric or strings?
  • Can skus be a single value?
  • Are there other properties?
  • What currency is total in?

Named structs


        class Order {
            public string $id;
            public Money $total;
            public bool $cancelled = false;
            public User $user;
            public array $skus = [];
        }
        

$options is a code smell


          function formatProduct(Product $product, array $options): string {
            // ...
          }
        
  • What are the values for $options?
  • Which are required
  • How are they spelled
  • What are their legal types? How strict?

¯\_(ツ)_/¯

So, docs?


          /**
           * Formats a product.
           *
           * @param Product $product
           * @param array $options
           *   - label: The label to apply to the product.
           *   - bgcolor: A color code for the background.
           *   - bgcolor: A color code for the background.
           *   - color: A foreground color.
           *   - font-size: The size in points.
           *   - layout: The template to use.
           *
           * @return string
           */
          function formatProduct(Product $product, array $options): string { ... }
        

But good code is self documenting!

Objects are self-documenting


          class FormatterOptions {
              public string $label = 'Your product';
              public string $bgcolor = '#FFF';
              public string $color = '#000';
              public int $fontSize = 14;
              public string $layout;
          }

          function formatProduct(Product $p, FormatterOptions $options): string {...}
        

Objects have methods


        class FormatterOptions {
            public string $label = 'Your product';
            public string $bgcolor = '#FFF';
            public string $color = '#000';
            public int $fontSize = 14;
            public string $layout;

            public static function forLayout(string $layout) { ... }

            public function darkMode(bool $dark): self {
                $this->bgcolor = $dark ? '#000' : '#FFF';
                $this->color  = $dark ? '#FFF' : '#000';
                return $this;
            }
        }
        

Turn it around


        class ProductFormatter {
            // ...

            public function __construct(string $layout) {
              $this->layout = $layout;
            }

            public function darkMode(bool $dark): self { ... }

            public function setColors(string $color, string $bgcolor): self {
                $this->color = $color;
                $this->bgcolor = $bgcolor;
                return $this;
            }

            public function setFontInPoints(int $size): self {
                $this->fontSize = $size;
                return $this;
            }

            public function format(Product $product): string { ... }
        }
        

Easy to use


        $formatter = new ProductFormatter('standard');

        $formatter->darkMode(true)->format($product);
        

If has dependencies


        $formatter = $themeSystem->getFormatter('standard');

        $formatter->darkMode(true)->format($product);
        

But aren't public properties eeeevil?

  • Private object => Public properties
  • Public API => protected properties with meaningful methods

But aren't objects big and slow

Benchmarks, yo


          const TEST_SIZE = 1_000_000;

          $list = [];
          $start = $stop = 0;
          $start = microtime(true);

          for ($i = 0; $i < TEST_SIZE; ++$i) {
            $list[$i] = [
              'a' => random_int(1, 500),
              'b' => base64_encode(random_bytes(16)),
            ];
          }

          ksort($list);

          usort($list, function($first, $second) {
            return [$first['a'], $first['b']] <=> [$second['a'], $second['b']];
          });

          $stop = microtime(true);
          $memory = memory_get_peak_usage();
          printf("Runtime: %s\nMemory: %s\n", $stop - $start, $memory);
        

Results

Technique Runtime (s) Memory (bytes)
Associative array 9.4311 (n/a) 541,450,384 (n/a)
stdClass 11.2173 (+18.94%) 589,831,120 (+8.94%)
Public properties 8.2172 (-12.87%) 253,831,584 (-53.12%)
Private properties 11.0881 (+17.57%) 253,833,000 (-53.12%)
Anonymous class 8.1095 (-14.07%) 253,832,368 (-53.12%)

Mind blown

Source: "Use associative arrays basically never"

So can you ever use an array?

  • Internal implementation detail
  • Array literal for iterable
  • Callables ([$obj, 'method'])
  • Generated code

This is actually more efficient (sometimes)


          class CompiledItems {

              protected const ITEMS = [
                  'thing1' => [
                      'color' => 'blue',
                      'size' => 5,
                  ],
                  'thing2' => [
                      'color' => 'green',
                      'size' => 7,
                  ],
              ];

              public function getItem(string $name) {
                  return new Item(static::ITEMS[$name]);
              }
          }
        

So really, Never* Use Arrays

(in public)

Larry Garfield

@Crell

Director of Developer Experience Platform.sh

The end-to-end web platform for agile teams

Stalk us at @PlatformSH

Buy my book!

https://bit.ly/fn-php

Cover of Thinking Functionally in PHP