Performance¶

This section discusses internals of this lib and performance tweaks that can help your app run smoother with it. Check the performance comparison with other libs in its page.

Class instantiation¶

During class instantiation, both the secret and personalisation values are derived, and every parameter is checked to be between certain bounds; therefore, there is a performance impact similar to sign a relatively small payload. It is twice as significant when instantiating Blake2SerializerSigner than the other signers. So, this creates an interesting optimization opportunity: to cache the class instantiation.

There is an example of this where the standard library's functools.cached_property is used to cache the class instantiation. Another option is to use functools.lru_cache, but make sure that you don't use it in a class method.
An alternative to cache for a web app would be contextvars.

I measured caching the instantiation vs not doing it, and it takes a lot less time! Using a cached instance gives a performance improvement of around 2x.
That's a huge performance bonus, particularly when this is done at least once per request for a web app, considering the cache lives across requests.

Warning

If the instantiation only occurs once, then using a cache won't make a difference since the first hit is always needed to produce it. Benchmark your implementation to make sure it is making a positive difference.

Measuring the performance of caching class instantiation

SourceOutput

"""Measuring the performance of caching class instantiation.

Requirements:
    python3 -m pip install \
        'blake2signer'

Usage:
    python3 caching_class_instantiation.py
    NUMBER=10 REPEAT=2 pypy3 caching_class_instantiation.py

    This will take a while to execute, go grab a mate, coffee, tea, or something else :).
"""
import json
import os
import statistics
import timeit
import typing as t
from collections.abc import Callable
from copy import deepcopy
from datetime import datetime
from datetime import timezone
from enum import Enum
from functools import lru_cache
from functools import partial
from math import isclose

from blake2signer import Blake2SerializerSigner
from blake2signer import Blake2Signer

NUMBER: t.Optional[int] = int(os.getenv('NUMBER', '0')) if os.getenv('NUMBER') else None
"""Set this value if you prefer to have a fixed number of rounds for the benchmark."""

REPEAT: int = int(os.getenv('REPEAT', '10'))
"""How many times should each iterations cycle repeat."""

COLUMNS = (40, 13, 7, 29)
"""Sizes of each table columns."""


class Benchmark(t.TypedDict):
    """Benchmark result."""
    all_runs: list[float]
    best: float
    average: float
    stdev: float
    loops: int
    repeat: int


class TermFormat(str, Enum):
    """Terminal formatting."""

    PURPLE = '\033[95m'
    CYAN = '\033[96m'
    DARKCYAN = '\033[36m'
    BLUE = '\033[94m'
    GREEN = '\033[92m'
    YELLOW = '\033[93m'
    RED = '\033[91m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    END = '\033[0m'

    def __str__(self) -> str:
        """String representation."""
        return self.value


P = t.ParamSpec('P')


def benchmark(
    func: Callable[P, t.Any],
    *args: P.args,
    **kwargs: P.kwargs,
) -> Benchmark:
    """Run a benchmark on the given function, similar to IPython's %timeit."""
    wrapped = partial(func, *args, **kwargs)
    timer = timeit.Timer(wrapped)

    repeat = REPEAT
    if NUMBER is None:
        number, _ = timer.autorange()
        number *= 2
    else:
        number = NUMBER

    results = timer.repeat(repeat=repeat, number=number)
    per_loop = [timing / number for timing in results]
    best = min(per_loop)
    avg = statistics.mean(per_loop)
    st_dev = statistics.stdev(per_loop) if len(per_loop) > 1 else 0.0

    print(
        format_time(avg),
        '±',
        format_time(st_dev),
        'per loop (mean ± std. dev. of',
        repeat,
        'runs',
        number,
        'loops each)',
    )

    return Benchmark(
        all_runs=results,
        best=best,
        average=avg,
        stdev=st_dev,
        loops=number,
        repeat=repeat,
    )


def format_time(
    dt: float,
    *,
    unit: t.Optional[str] = None,
    precision: int = 3,
) -> str:
    """Format time (copied from timeit lib)."""
    units = {  # This map needs to be sorted from larger to smaller
        's': 1.0,
        'ms': 1e-3,
        'us': 1e-6,
        'ns': 1e-9,
    }

    if unit:
        scale = units[unit]
    else:
        for unit, scale in units.items():  # noqa: B007 # pylint: disable=R1704
            if dt >= scale:
                break
        else:
            unit = 'ns'
            scale = units[unit]

    return f'{dt / scale:.{precision}g} {unit}'


def print_row(
    *,
    name: str,
    value: float,
    ok: bool,
    baseline: float,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print a comparison table row."""
    x_times = baseline / value
    if isclose(value, baseline):
        detail = ''
    elif isclose(x_times, 1.0, rel_tol=0.1):
        detail = '(close to baseline)'
    elif x_times < 1:
        detail = '(slower than baseline)'
    else:
        detail = '(faster than baseline)'

    print(
        name.ljust(columns_sizes[0]),
        '|',
        format_time(value).rjust(columns_sizes[1]),
        '|',
        ('√' if ok else '⚠').center(columns_sizes[2]),
        '|',
        'baseline' if isclose(value, baseline) else f'{x_times:4.2f}x'.rjust(4),
        detail,
    )


def is_timing_ok(timing: Benchmark, /) -> bool:
    """Return True if the timing measurement appears to be correct."""
    return (timing['best'] / timing['stdev']) > 60


def print_table(
    *,
    title: str,
    timings: dict[str, Benchmark],
    baseline: str,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print comparison table."""
    print()
    print(title.ljust(columns_sizes[0]), '| Best Abs Time | Measure | Comparison')
    print(
        '-' * columns_sizes[0],
        '|',
        '-' * columns_sizes[1],
        '|',
        '-' * columns_sizes[2],
        '|',
        '-' * columns_sizes[3],
    )
    baseline_value = timings[baseline]['best']
    for name, timing in timings.items():
        ok = is_timing_ok(timing)
        print_row(
            name=name,
            value=timing['best'],
            ok=ok,
            baseline=baseline_value,
            columns_sizes=columns_sizes,
        )
    print()


def test_b2s_no_cache(secret: bytes, data: bytes) -> None:
    """Test Blake2Signer without caching instantiation."""
    signer = Blake2Signer(secret)
    signer.unsign(signer.sign(data))


@lru_cache(maxsize=None)
def b2s_cached(secret: bytes) -> Blake2Signer:
    """Cache Blake2Signer instantiation."""
    return Blake2Signer(secret)


def test_b2s_cache(secret: bytes, data: bytes) -> None:
    """Test Blake2Signer caching instantiation."""
    signer = b2s_cached(secret)
    signer.unsign(signer.sign(data))


def test_b2ss_no_cache(secret: bytes, data: dict[str, t.Any]) -> None:
    """Test Blake2SerializerSigner without caching instantiation."""
    serializer = Blake2SerializerSigner(secret)
    serializer.loads(serializer.dumps(data))


@lru_cache(maxsize=None)
def b2ss_cached(secret: bytes) -> Blake2SerializerSigner:
    """Cache Blake2SerializerSigner instantiation."""
    return Blake2SerializerSigner(secret)


def test_b2ss_cache(secret: bytes, data: dict[str, t.Any]) -> None:
    """Test Blake2SerializerSigner caching instantiation."""
    serializer = b2ss_cached(secret)
    serializer.loads(serializer.dumps(data))


def main() -> None:
    """Run comparisons."""
    secret = b'secret' * 3
    data = {
        'user': {
            'id': 1337,
            'name': 'HacKan CuBa',
            'username': 'hackan_cuba',
            'email': 'hackan@email.com',
        },
        '_meta': {
            'iss': 'blake2signer',
            'nbf': datetime.now(timezone.utc).isoformat(),
        },
    }
    data_b = json.dumps(data).encode()
    sentinel_payload = deepcopy(data)

    signers: dict[str, Benchmark] = {}
    serializers: dict[str, Benchmark] = {}

    print(
        TermFormat.CYAN,
        'Measuring impact of caching class instantiation, please wait (this will take a while)...',
        TermFormat.END,
        sep='',
    )
    print()

    print('Measure Blake2Signer without caching instantiation')
    signers['Blake2Signer w/o caching'] = benchmark(test_b2s_no_cache, secret, data_b)

    print('Measure Blake2Signer caching instantiation')
    signers['Blake2Signer w/ caching'] = benchmark(test_b2s_cache, secret, data_b)

    print('Measure Blake2SerializerSigner without caching instantiation')
    serializers['Blake2SerializerSigner w/o caching'] = benchmark(test_b2ss_no_cache, secret, data)
    assert data == sentinel_payload

    print('Measure Blake2SerializerSigner caching instantiation')
    serializers['Blake2SerializerSigner w/ caching'] = benchmark(test_b2ss_cache, secret, data)
    assert data == sentinel_payload

    print_table(
        title='Signer',
        timings=signers,
        baseline='Blake2Signer w/o caching',
        columns_sizes=COLUMNS,
    )

    print_table(
        title='Serializer',
        timings=serializers,
        baseline='Blake2SerializerSigner w/o caching',
        columns_sizes=COLUMNS,
    )


if __name__ == '__main__':
    main()

Measuring impact of caching class intantiation, please wait (this will take a while)...

Measure Blake2Signer without caching instantiation
21.6 us ± 377 ns per loop (mean ± std. dev. of 10 runs 100000 loops each)
Measure Blake2Signer caching instantiation
9.17 us ± 95.9 ns per loop (mean ± std. dev. of 10 runs 100000 loops each)
Measure Blake2SerializerSigner without caching instantiation
149 us ± 795 ns per loop (mean ± std. dev. of 10 runs 100000 loops each)
Measure Blake2SerializerSigner caching instantiation
115 us ± 597 ns per loop (mean ± std. dev. of 10 runs 100000 loops each)

Signer                                   | Best Abs Time | Measure | Comparison
---------------------------------------- | ------------- | ------- | -----------------------------
Blake2Signer w/o caching                 |       21.3 us |    ⚠    | baseline 
Blake2Signer w/ caching                  |       9.09 us |    √    | 2.34x (faster than baseline)


Serializer                               | Best Abs Time | Measure | Comparison
---------------------------------------- | ------------- | ------- | -----------------------------
Blake2SerializerSigner w/o caching       |        147 us |    √    | baseline 
Blake2SerializerSigner w/ caching        |        114 us |    √    | 1.29x (faster than baseline)

Note

The standard deviation presented on each evaluation should be around two orders of magnitude lower than the statistics mean for appropriate results. As a simple reference, if the mean is in ms, then the std dev should be in us.

Info

Blake2TimestampSigner is equivalent in its instantiation to Blake2Signer, so it is not benchmarked here.

Preferring bytes over strings¶

Internally, all signers need to work with bytes because the hashers have this requirement. For convenience, both bytes and strings are accepted as input, but a conversion happens behind the curtains. This conversion has an impact on performance, and it can be somewhat significant in the long run: when profiling a sign or unsign cycle, one can see that most of the time is spent calculating the hash (this is unavoidable), but a good portion of the rest of the time is spent encoding strings!

Profiling the signer

SourceOutput

"""Profiling the signer."""

from blake2signer import Blake2Signer

secret = b'secret' * 3
data = b'data' * 10_000_000  # Has to be very large to easily see the numbers
data_s = data.decode()

signer = Blake2Signer(secret)

# Note that the timing values per se are not important, but their order is.

print('Profiling signing with data as bytes')
# Using ipython:
%prun -l 4 signer.sign(data)
# 32 function calls in 0.101 seconds
#
# Ordered by: internal time
# List reduced from 24 to 4 due to restriction <4>
#
# ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#     1    0.071    0.071    0.071    0.071 blakehashers.py:195(digest)
#     1    0.028    0.028    0.028    0.028 bases.py:396(_compose)
#     1    0.002    0.002    0.101    0.101 <string>:1(<module>)
#     1    0.000    0.000    0.101    0.101 {built-in method builtins.exec}

print('\nProfiling signing with data as string')
%prun -l 4 signer.sign(data_s)
# 34 function calls in 0.132 seconds
#
# Ordered by: internal time
# List reduced from 25 to 4 due to restriction <4>
#
# ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#     1    0.071    0.071    0.071    0.071 blakehashers.py:195(digest)
#     1    0.029    0.029    0.029    0.029 {method 'encode' of 'str' objects}     <<<<!!!
#     1    0.027    0.027    0.027    0.027 bases.py:396(_compose)
#     1    0.004    0.004    0.132    0.132 <string>:1(<module>)

Profiling signing with data as bytes
         32 function calls in 0.101 seconds

   Ordered by: internal time
   List reduced from 24 to 4 due to restriction <4>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.071    0.071    0.071    0.071 blakehashers.py:195(digest)
        1    0.028    0.028    0.028    0.028 bases.py:396(_compose)
        1    0.002    0.002    0.101    0.101 <string>:1(<module>)
        1    0.000    0.000    0.101    0.101 {built-in method builtins.exec}

Profiling signing with data as string
         34 function calls in 0.132 seconds

   Ordered by: internal time
   List reduced from 25 to 4 due to restriction <4>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.071    0.071    0.071    0.071 blakehashers.py:195(digest)
        1    0.029    0.029    0.029    0.029 {method 'encode' of 'str' objects}
        1    0.027    0.027    0.027    0.027 bases.py:396(_compose)
        1    0.004    0.004    0.132    0.132 <string>:1(<module>)

Therefore, you should prefer using bytes rather than strings. However, if you can't avoid it, it's fine: don't lose your mind thinking how to do it! The benefit is marginal at best for large payloads, and almost negligible for small ones. So this is to make the point that, in the long run, if you can use bytes, then that should be preferred; otherwise, it's fine.

The same goes for files!¶

When using file-related methods, like Blake2SerializerSigner's load and dump, this consideration is also pertinent.
For both, it is convenient for the file to be opened in binary mode, rather than in text mode. This is to prevent a str to bytes conversion in the first case, and to prevent a bytes to str conversion in the second case.

Choosing the right signer¶

This lib offers three signers, and one of them is additionally a serializer, meaning it can serialize any python object before signing it. You should be aware that this has a huge impact on performance and that serializing objects can be expensive.

To serialize or not to serialize

SourceOutput

"""To serialize or not to serialize: that is the question."""

from blake2signer import Blake2Signer, Blake2SerializerSigner

secret = 's' * 16
data = 'data' * 20

signer = Blake2Signer(secret)
serializer_signer = Blake2SerializerSigner(secret)

print('Timing signer...')
%timeit -r 10 signer.unsign(signer.sign(data))
print('Timing serializer signer...')
%timeit -r 10 serializer_signer.loads(serializer_signer.dumps(data, compress=False))

Timing signer...
10.6 µs ± 171 ns per loop (mean ± std. dev. of 10 runs, 100,000 loops each)
Timing serializer signer...
28.9 µs ± 2.07 µs per loop (mean ± std. dev. of 10 runs, 10,000 loops each)

Note

The standard deviation presented on each evaluation should be around two orders of magnitude lower than the statistics mean for appropriate results. As a simple reference, if the mean is in ms, then the std dev should be in us.

In the example above, serializing the simple string costs us almost thrice as much as not doing it, which is pretty significant. However, if you don't know from beforehand the kind of objects you will be signing, then going for the serializer signer would be the safe bet.

Compressing has its perks¶

The serializer signer class can compress the payload to make it smaller and more manageable, but this implies a big performance hit: compressing and decompressing has a cost. The class is somewhat smart and checks that if the payload wasn't compressed enough, then it will leave it as it is, so it doesn't waste additional time during decompression for no gain. However, it needs to try and compress it first, so some time may be wasted. For incompressible data, around ~35% of time is saved if compression is disabled.

Given this, it can be beneficial if you know from beforehand whether it will be worth compressing the payload or not: you can control this using the parameters compress, compression_level and compression_ratio with Blake2SerializerSigner. Check the examples for more information.

Generally, regular data with human-readable text is highly compressible, which is why this characteristic is enabled by default, but YMMV.

The cost of compression

SourceOutput

"""The cost of compression.

This script helps compare the cost of compressing the payload.

Requirements:
    python3 -m pip install \
        'blake2signer'

Usage:
    python3 cost_of_compression.py
    NUMBER=10 REPEAT=2 pypy3 cost_of_compression.py
"""

import os
import statistics
import timeit
import typing as t
from collections.abc import Callable
from enum import Enum
from functools import partial
from math import isclose
from secrets import token_bytes

from blake2signer import Blake2SerializerSigner
from blake2signer.serializers import NullSerializer

NUMBER: t.Optional[int] = int(os.getenv('NUMBER', '0')) if os.getenv('NUMBER') else None
"""Set this value if you prefer to have a fixed number of rounds for the benchmark."""

REPEAT: int = int(os.getenv('REPEAT', '10'))
"""How many times should each iterations cycle repeat."""

COLUMNS = (40, 13, 7, 29)
"""Sizes of each table columns."""


class Benchmark(t.TypedDict):
    """Benchmark result."""
    all_runs: list[float]
    best: float
    average: float
    stdev: float
    loops: int
    repeat: int


class TermFormat(str, Enum):
    """Terminal formatting."""

    PURPLE = '\033[95m'
    CYAN = '\033[96m'
    DARKCYAN = '\033[36m'
    BLUE = '\033[94m'
    GREEN = '\033[92m'
    YELLOW = '\033[93m'
    RED = '\033[91m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    END = '\033[0m'

    def __str__(self) -> str:
        """String representation."""
        return self.value


P = t.ParamSpec('P')


def benchmark(
    func: Callable[P, t.Any],
    *args: P.args,
    **kwargs: P.kwargs,
) -> Benchmark:
    """Run a benchmark on the given function, similar to IPython's %timeit."""
    wrapped = partial(func, *args, **kwargs)
    timer = timeit.Timer(wrapped)

    repeat = REPEAT
    if NUMBER is None:
        number, _ = timer.autorange()
        number *= 2
    else:
        number = NUMBER

    results = timer.repeat(repeat=repeat, number=number)
    per_loop = [timing / number for timing in results]
    best = min(per_loop)
    avg = statistics.mean(per_loop)
    st_dev = statistics.stdev(per_loop) if len(per_loop) > 1 else 0.0

    print(
        format_time(avg),
        '±',
        format_time(st_dev),
        'per loop (mean ± std. dev. of',
        repeat,
        'runs',
        number,
        'loops each)',
    )

    return Benchmark(
        all_runs=results,
        best=best,
        average=avg,
        stdev=st_dev,
        loops=number,
        repeat=repeat,
    )


def format_time(
    dt: float,
    *,
    unit: t.Optional[str] = None,
    precision: int = 3,
) -> str:
    """Format time (copied from timeit lib)."""
    units = {  # This map needs to be sorted from larger to smaller
        's': 1.0,
        'ms': 1e-3,
        'us': 1e-6,
        'ns': 1e-9,
    }

    if unit:
        scale = units[unit]
    else:
        for unit, scale in units.items():  # noqa: B007 # pylint: disable=R1704
            if dt >= scale:
                break
        else:
            unit = 'ns'
            scale = units[unit]

    return f'{dt / scale:.{precision}g} {unit}'


def print_row(
    *,
    name: str,
    value: float,
    ok: bool,
    baseline: float,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print a comparison table row."""
    x_times = baseline / value
    if isclose(value, baseline):
        detail = ''
    elif isclose(x_times, 1.0, rel_tol=0.1):
        detail = '(close to baseline)'
    elif x_times < 1:
        detail = '(slower than baseline)'
    else:
        detail = '(faster than baseline)'

    print(
        name.ljust(columns_sizes[0]),
        '|',
        format_time(value).rjust(columns_sizes[1]),
        '|',
        ('√' if ok else '⚠').center(columns_sizes[2]),
        '|',
        'baseline' if isclose(value, baseline) else f'{x_times:4.2f}x'.rjust(4),
        detail,
    )


def is_timing_ok(timing: Benchmark, /) -> bool:
    """Return True if the timing measurement appears to be correct."""
    return (timing['best'] / timing['stdev']) > 60


def print_table(
    *,
    title: str,
    timings: dict[str, Benchmark],
    baseline: str,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print comparison table."""
    print()
    print(title.ljust(columns_sizes[0]), '| Best Abs Time | Measure | Comparison')
    print(
        '-' * columns_sizes[0],
        '|',
        '-' * columns_sizes[1],
        '|',
        '-' * columns_sizes[2],
        '|',
        '-' * columns_sizes[3],
    )
    baseline_value = timings[baseline]['best']
    for name, timing in timings.items():
        ok = is_timing_ok(timing)
        print_row(
            name=name,
            value=timing['best'],
            ok=ok,
            baseline=baseline_value,
            columns_sizes=columns_sizes,
        )
    print()


def main() -> None:
    """Run comparisons."""
    print(
        TermFormat.CYAN,
        'Measuring the cost of compression, please wait (this will take a while)...',
        TermFormat.END,
        sep='',
    )
    print()

    secret = b'secret' * 3
    incompressible_data = token_bytes()
    signer = Blake2SerializerSigner(secret, serializer=NullSerializer)

    timings: dict[str, Benchmark] = {}

    print('With full compression')
    timings['With full compression'] = benchmark(
        lambda s,
        d: s.loads(s.dumps(d, force_compression=True)),
        signer,
        incompressible_data,
    )

    print('With smart compression')
    timings['With smart compression'] = benchmark(
        lambda s,
        d: s.loads(s.dumps(d, compress=True)),
        signer,
        incompressible_data,
    )

    print('Without compression')
    timings['Without compression'] = benchmark(
        lambda s,
        d: s.loads(s.dumps(d, compress=False)),
        signer,
        incompressible_data,
    )

    print_table(
        title='Timing',
        timings=timings,
        baseline='With full compression',
        columns_sizes=COLUMNS,
    )


if __name__ == '__main__':
    main()

Measuring the cost of compression, please wait (this will take a while)...

With full compression
82.4 us ± 576 ns per loop (mean ± std. dev. of 10 runs 10000 loops each)
With smart compression
80.6 us ± 302 ns per loop (mean ± std. dev. of 10 runs 10000 loops each)
Without compression
14.3 us ± 981 ns per loop (mean ± std. dev. of 10 runs 40000 loops each)

Timing                                   | Best Abs Time | Measure | Comparison
---------------------------------------- | ------------- | ------- | -----------------------------
With full compression                    |       81.6 us |    √    | baseline
With smart compression                   |       80.2 us |    √    | 1.02x (close to baseline)
Without compression                      |       13.8 us |    ⚠    | 5.91x (faster than baseline)

Note

The standard deviation presented on each evaluation should be around two orders of magnitude lower than the statistics mean for appropriate results. As a simple reference, if the mean is in ms, then the std dev should be in us.

Randomness is expensive¶

Unfortunately, extracting cryptographically secure pseudorandom data in Python is a bit expensive, so generating a salt can take its toll. You can control whether a salt is used or not with the deterministic class instantiation parameter. However, this performance impact may be negligible for your implementation, and having a salt can be a positive trait.

The cost of randomness

SourceOutput

"""The cost of randomness.

This script helps compare the cost of randomness.

Requirements:
    python3 -m pip install \
        'blake2signer'

Usage:
    python3 cost_of_randomness.py
    NUMBER=10 REPEAT=2 pypy3 cost_of_randomness.py
"""

import os
import statistics
import timeit
import typing as t
from collections.abc import Callable
from enum import Enum
from functools import partial
from math import isclose

from blake2signer import Blake2Signer

NUMBER: t.Optional[int] = int(os.getenv('NUMBER', '0')) if os.getenv('NUMBER') else None
"""Set this value if you prefer to have a fixed number of rounds for the benchmark."""

REPEAT: int = int(os.getenv('REPEAT', '10'))
"""How many times should each iterations cycle repeat."""

COLUMNS = (40, 13, 7, 29)
"""Sizes of each table columns."""


class Benchmark(t.TypedDict):
    """Benchmark result."""
    all_runs: list[float]
    best: float
    average: float
    stdev: float
    loops: int
    repeat: int


class TermFormat(str, Enum):
    """Terminal formatting."""

    PURPLE = '\033[95m'
    CYAN = '\033[96m'
    DARKCYAN = '\033[36m'
    BLUE = '\033[94m'
    GREEN = '\033[92m'
    YELLOW = '\033[93m'
    RED = '\033[91m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    END = '\033[0m'

    def __str__(self) -> str:
        """String representation."""
        return self.value


P = t.ParamSpec('P')


def benchmark(
    func: Callable[P, t.Any],
    *args: P.args,
    **kwargs: P.kwargs,
) -> Benchmark:
    """Run a benchmark on the given function, similar to IPython's %timeit."""
    wrapped = partial(func, *args, **kwargs)
    timer = timeit.Timer(wrapped)

    repeat = REPEAT
    if NUMBER is None:
        number, _ = timer.autorange()
        number *= 2
    else:
        number = NUMBER

    results = timer.repeat(repeat=repeat, number=number)
    per_loop = [timing / number for timing in results]
    best = min(per_loop)
    avg = statistics.mean(per_loop)
    st_dev = statistics.stdev(per_loop) if len(per_loop) > 1 else 0.0

    print(
        format_time(avg),
        '±',
        format_time(st_dev),
        'per loop (mean ± std. dev. of',
        repeat,
        'runs',
        number,
        'loops each)',
    )

    return Benchmark(
        all_runs=results,
        best=best,
        average=avg,
        stdev=st_dev,
        loops=number,
        repeat=repeat,
    )


def format_time(
    dt: float,
    *,
    unit: t.Optional[str] = None,
    precision: int = 3,
) -> str:
    """Format time (copied from timeit lib)."""
    units = {  # This map needs to be sorted from larger to smaller
        's': 1.0,
        'ms': 1e-3,
        'us': 1e-6,
        'ns': 1e-9,
    }

    if unit:
        scale = units[unit]
    else:
        for unit, scale in units.items():  # noqa: B007 # pylint: disable=R1704
            if dt >= scale:
                break
        else:
            unit = 'ns'
            scale = units[unit]

    return f'{dt / scale:.{precision}g} {unit}'


def print_row(
    *,
    name: str,
    value: float,
    ok: bool,
    baseline: float,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print a comparison table row."""
    x_times = baseline / value
    if isclose(value, baseline):
        detail = ''
    elif isclose(x_times, 1.0, rel_tol=0.1):
        detail = '(close to baseline)'
    elif x_times < 1:
        detail = '(slower than baseline)'
    else:
        detail = '(faster than baseline)'

    print(
        name.ljust(columns_sizes[0]),
        '|',
        format_time(value).rjust(columns_sizes[1]),
        '|',
        ('√' if ok else '⚠').center(columns_sizes[2]),
        '|',
        'baseline' if isclose(value, baseline) else f'{x_times:4.2f}x'.rjust(4),
        detail,
    )


def is_timing_ok(timing: Benchmark, /) -> bool:
    """Return True if the timing measurement appears to be correct."""
    return (timing['best'] / timing['stdev']) > 60


def print_table(
    *,
    title: str,
    timings: dict[str, Benchmark],
    baseline: str,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print comparison table."""
    print()
    print(title.ljust(columns_sizes[0]), '| Best Abs Time | Measure | Comparison')
    print(
        '-' * columns_sizes[0],
        '|',
        '-' * columns_sizes[1],
        '|',
        '-' * columns_sizes[2],
        '|',
        '-' * columns_sizes[3],
    )
    baseline_value = timings[baseline]['best']
    for name, timing in timings.items():
        ok = is_timing_ok(timing)
        print_row(
            name=name,
            value=timing['best'],
            ok=ok,
            baseline=baseline_value,
            columns_sizes=columns_sizes,
        )
    print()


def main() -> None:
    """Run comparisons."""
    print(
        TermFormat.CYAN,
        'Measuring the cost of randomness, please wait (this will take a while)...',
        TermFormat.END,
        sep='',
    )
    print()

    secret = b'Protect whistleblowers!'
    regular_data = b'Free Chelsea Manning!' * 10
    large_data = regular_data * 100

    signer = Blake2Signer(secret, deterministic=False)
    deterministic_signer = Blake2Signer(secret, deterministic=True)

    for payload, data in (('regular', regular_data), ('large', large_data)):
        print()
        print(
            TermFormat.GREEN,
            'Payload size ~: ',
            len(data),
            ' bytes (',
            payload,
            ')',
            TermFormat.END,
            sep='',
        )

        timings: dict[str, Benchmark] = {}

        print('Non-deterministic signature')
        timings['Non-deterministic signature'] = benchmark(
            lambda d: signer.unsign(signer.sign(d)),
            data,
        )

        print('Deterministic signature')
        timings['Deterministic signature'] = benchmark(
            lambda d: deterministic_signer.unsign(deterministic_signer.sign(d)),
            data,
        )

        print_table(
            title='Signer',
            timings=timings,
            baseline='Non-deterministic signature',
            columns_sizes=COLUMNS,
        )


if __name__ == '__main__':
    main()

Measuring the cost of randomness, please wait (this will take a while)...


Payload size ~: 210 bytes (regular)
Non-deterministic signature
8.87 us ± 373 ns per loop (mean ± std. dev. of 10 runs 100000 loops each)
Deterministic signature
6.31 us ± 54 ns per loop (mean ± std. dev. of 10 runs 100000 loops each)

Signer                                   | Best Abs Time | Measure | Comparison
---------------------------------------- | ------------- | ------- | -----------------------------
Non-deterministic signature              |       8.63 us |    ⚠    | baseline
Deterministic signature                  |       6.21 us |    √    | 1.39x (faster than baseline)


Payload size ~: 21000 bytes (large)
Non-deterministic signature
64.9 us ± 721 ns per loop (mean ± std. dev. of 10 runs 10000 loops each)
Deterministic signature
62.4 us ± 507 ns per loop (mean ± std. dev. of 10 runs 10000 loops each)

Signer                                   | Best Abs Time | Measure | Comparison
---------------------------------------- | ------------- | ------- | -----------------------------
Non-deterministic signature              |         64 us |    √    | baseline
Deterministic signature                  |       61.7 us |    √    | 1.04x (close to baseline)

Note

The standard deviation presented on each evaluation should be around two orders of magnitude lower than the statistics mean for appropriate results. As a simple reference, if the mean is in ms, then the std dev should be in us.

BLAKE versions¶

Different BLAKE versions and modes can perform better or worse depending on the hardware they're running on. For example, BLAKE2b is optimized for 64b platforms whereas BLAKE2s, for 8-32b platforms (read more about them in their official site). On the other hand, BLAKE3 is general purpose and designed to be as fast as possible, and it certainly succeeds at being several times faster than BLAKE2 (read more on its official site).
You should benchmark your implementation to see which hasher performs better for your use case.

BLAKE3

In my trials, BLAKE3 turned out to be slower for small payloads than BLAKE2. It could be related to the particular implementation, or it could be designed like that. I will update this information if it changes in the future (it is still very new).

Comparing BLAKE versions

SourceOutput

"""Comparing BLAKE versions.

This script helps compare the performance of BLAKE2 vs BLAKE3.

Requirements:
    python3 -m pip install \
        'blake2signer[blake3]'

Usage:
    python3 comparison.py
    NUMBER=10 REPEAT=2 pypy3 comparison.py
"""

import os
import statistics
import timeit
import typing as t
from collections.abc import Callable
from enum import Enum
from functools import partial
from math import isclose

from blake2signer import Blake2Signer
from blake2signer.hashers import HasherChoice

NUMBER: t.Optional[int] = int(os.getenv('NUMBER', '0')) if os.getenv('NUMBER') else None
"""Set this value if you prefer to have a fixed number of rounds for the benchmark."""

REPEAT: int = int(os.getenv('REPEAT', '10'))
"""How many times should each iterations cycle repeat."""

COLUMNS = (20, 13, 7, 29)
"""Sizes of each table columns."""


class Benchmark(t.TypedDict):
    """Benchmark result."""
    all_runs: list[float]
    best: float
    average: float
    stdev: float
    loops: int
    repeat: int


class TermFormat(str, Enum):
    """Terminal formatting."""

    PURPLE = '\033[95m'
    CYAN = '\033[96m'
    DARKCYAN = '\033[36m'
    BLUE = '\033[94m'
    GREEN = '\033[92m'
    YELLOW = '\033[93m'
    RED = '\033[91m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    END = '\033[0m'

    def __str__(self) -> str:
        """String representation."""
        return self.value


P = t.ParamSpec('P')


def benchmark(
    func: Callable[P, t.Any],
    *args: P.args,
    **kwargs: P.kwargs,
) -> Benchmark:
    """Run a benchmark on the given function, similar to IPython's %timeit."""
    wrapped = partial(func, *args, **kwargs)
    timer = timeit.Timer(wrapped)

    repeat = REPEAT
    if NUMBER is None:
        number, _ = timer.autorange()
        number *= 2
    else:
        number = NUMBER

    results = timer.repeat(repeat=repeat, number=number)
    per_loop = [timing / number for timing in results]
    best = min(per_loop)
    avg = statistics.mean(per_loop)
    st_dev = statistics.stdev(per_loop) if len(per_loop) > 1 else 0.0

    print(
        format_time(avg),
        '±',
        format_time(st_dev),
        'per loop (mean ± std. dev. of',
        repeat,
        'runs',
        number,
        'loops each)',
    )

    return Benchmark(
        all_runs=results,
        best=best,
        average=avg,
        stdev=st_dev,
        loops=number,
        repeat=repeat,
    )


def format_time(
    dt: float,
    *,
    unit: t.Optional[str] = None,
    precision: int = 3,
) -> str:
    """Format time (copied from timeit lib)."""
    units = {  # This map needs to be sorted from larger to smaller
        's': 1.0,
        'ms': 1e-3,
        'us': 1e-6,
        'ns': 1e-9,
    }

    if unit:
        scale = units[unit]
    else:
        for unit, scale in units.items():  # noqa: B007 # pylint: disable=R1704
            if dt >= scale:
                break
        else:
            unit = 'ns'
            scale = units[unit]

    return f'{dt / scale:.{precision}g} {unit}'


def print_row(
    *,
    name: str,
    value: float,
    ok: bool,
    baseline: float,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print a comparison table row."""
    x_times = baseline / value
    if isclose(value, baseline):
        detail = ''
    elif isclose(x_times, 1.0, rel_tol=0.1):
        detail = '(close to baseline)'
    elif x_times < 1:
        detail = '(slower than baseline)'
    else:
        detail = '(faster than baseline)'

    print(
        name.ljust(columns_sizes[0]),
        '|',
        format_time(value).rjust(columns_sizes[1]),
        '|',
        ('√' if ok else '⚠').center(columns_sizes[2]),
        '|',
        'baseline' if isclose(value, baseline) else f'{x_times:4.2f}x'.rjust(4),
        detail,
    )


def is_timing_ok(timing: Benchmark, /) -> bool:
    """Return True if the timing measurement appears to be correct."""
    return (timing['best'] / timing['stdev']) > 60


def print_table(
    *,
    title: str,
    timings: dict[str, Benchmark],
    baseline: str,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print comparison table."""
    print()
    print(title.ljust(columns_sizes[0]), '| Best Abs Time | Measure | Comparison')
    print(
        '-' * columns_sizes[0],
        '|',
        '-' * columns_sizes[1],
        '|',
        '-' * columns_sizes[2],
        '|',
        '-' * columns_sizes[3],
    )
    baseline_value = timings[baseline]['best']
    for name, timing in timings.items():
        ok = is_timing_ok(timing)
        print_row(
            name=name,
            value=timing['best'],
            ok=ok,
            baseline=baseline_value,
            columns_sizes=columns_sizes,
        )
    print()


def main() -> None:
    """Run comparisons."""
    secret = b'civil disobedience is necessary'
    small_data = b'remember Aaron Swartz'
    regular_data = small_data * 10
    large_data = small_data * 1_000

    print(
        TermFormat.CYAN,
        'Comparing BLAKE versions, please wait (this will take a while)...',
        TermFormat.END,
        sep='',
    )
    print()

    for payload, data in (('small', small_data), ('regular', regular_data), ('large', large_data)):
        timings: dict[str, Benchmark] = {}
        print()
        print(
            TermFormat.GREEN,
            'Payload size ~: ',
            len(data),
            ' bytes (',
            payload,
            ')',
            TermFormat.END,
            sep='',
        )

        for hasher in (HasherChoice.blake2b, HasherChoice.blake2s, HasherChoice.blake3):
            signer = Blake2Signer(secret, hasher=hasher)

            print(hasher)
            timings[hasher] = benchmark(lambda s, d: s.unsign(s.sign(d)), signer, data)

        print_table(
            title='Hasher',
            timings=timings,
            baseline=HasherChoice.blake2b,
            columns_sizes=COLUMNS,
        )


if __name__ == '__main__':
    main()

Comparing BLAKE versions, please wait (this will take a while)...


Payload size ~: 21 bytes (small)
HasherChoice.blake2b
8.3 us ± 124 ns per loop (mean ± std. dev. of 10 runs 100000 loops each)
HasherChoice.blake2s
8.38 us ± 412 ns per loop (mean ± std. dev. of 10 runs 100000 loops each)
HasherChoice.blake3
9.99 us ± 35 ns per loop (mean ± std. dev. of 10 runs 100000 loops each)

Hasher               | Best Abs Time | Measure | Comparison
-------------------- | ------------- | ------- | -----------------------------
blake2b              |       8.17 us |    √    | baseline
blake2s              |       8.04 us |    ⚠    |  1.0x (close to baseline)
blake3               |       9.93 us |    √    | 0.82x (slower than baseline)


Payload size ~: 210 bytes (regular)
HasherChoice.blake2b
8.61 us ± 35.6 ns per loop (mean ± std. dev. of 10 runs 100000 loops each)
HasherChoice.blake2s
8.92 us ± 18.5 ns per loop (mean ± std. dev. of 10 runs 100000 loops each)
HasherChoice.blake3
10.2 us ± 139 ns per loop (mean ± std. dev. of 10 runs 40000 loops each)

Hasher               | Best Abs Time | Measure | Comparison
-------------------- | ------------- | ------- | -----------------------------
blake2b              |       8.57 us |    √    | baseline
blake2s              |       8.88 us |    √    | 0.96x (slower than baseline)
blake3               |       10.1 us |    √    | 0.85x (slower than baseline)


Payload size ~: 21000 bytes (large)
HasherChoice.blake2b
64.1 us ± 324 ns per loop (mean ± std. dev. of 10 runs 10000 loops each)
HasherChoice.blake2s
94.1 us ± 190 ns per loop (mean ± std. dev. of 10 runs 10000 loops each)
HasherChoice.blake3
28.4 us ± 124 ns per loop (mean ± std. dev. of 10 runs 20000 loops each)

Hasher               | Best Abs Time | Measure | Comparison
-------------------- | ------------- | ------- | -----------------------------
blake2b              |       63.8 us |    √    | baseline
blake2s              |       93.8 us |    √    | 0.68x (slower than baseline)
blake3               |       28.3 us |    √    |  2.3x (faster than baseline)

Note

The standard deviation presented on each evaluation should be around two orders of magnitude lower than the statistics mean for appropriate results. As a simple reference, if the mean is in ms, then the std dev should be in us.

Encoders¶

Due to implementation details, some encoders are faster than others. In particular, the default encoder B64URLEncoder uses binascii under the hood, which is written in C. The same goes for HexEncoder. However, both B32Encoder and B58Encoder are pure python, which makes them slower.

Using PyPy

If you are using PyPy, then pure Python encoders may not be slower, and perhaps could even be faster than C ones.

I have benchmarked some potential performance gain using pybase64, in particular for larger-size payloads. You can implement it as a custom encoder with ease, as seen in the example below. Unfortunately, even though it may be faster on its own, it doesn't seem to have a significant difference in performance when used with a serializer signer.

Comparing Encoders

SourceOutput

"""Comparing Encoders.

This script helps compare the performance of different encoders.

Requirements:
    python3 -m pip install \
        'blake2signer' \
        'pybase64~=1.4'

Usage:
    python3 comparing_encoders.py
    NUMBER=10 REPEAT=2 pypy3 comparing_encoders.py
"""

import json
import os
import statistics
import timeit
import typing as t
from collections.abc import Callable
from copy import deepcopy
from datetime import datetime
from datetime import timezone
from enum import Enum
from functools import partial
from math import isclose

from pybase64 import b64decode
from pybase64 import b64encode

from blake2signer import Blake2SerializerSigner
from blake2signer.encoders import B32Encoder
from blake2signer.encoders import B58Encoder
from blake2signer.encoders import B64URLEncoder
from blake2signer.encoders import HexEncoder
from blake2signer.interfaces import EncoderInterface

NUMBER: t.Optional[int] = int(os.getenv('NUMBER', '0')) if os.getenv('NUMBER') else None
"""Set this value if you prefer to have a fixed number of rounds for the benchmark."""

REPEAT: int = int(os.getenv('REPEAT', '10'))
"""How many times should each iterations cycle repeat."""

COLUMNS = (25, 13, 7, 29)
"""Sizes of each table columns."""


class Benchmark(t.TypedDict):
    """Benchmark result."""
    all_runs: list[float]
    best: float
    average: float
    stdev: float
    loops: int
    repeat: int


class TermFormat(str, Enum):
    """Terminal formatting."""

    PURPLE = '\033[95m'
    CYAN = '\033[96m'
    DARKCYAN = '\033[36m'
    BLUE = '\033[94m'
    GREEN = '\033[92m'
    YELLOW = '\033[93m'
    RED = '\033[91m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    END = '\033[0m'

    def __str__(self) -> str:
        """String representation."""
        return self.value


P = t.ParamSpec('P')


def benchmark(
    func: Callable[P, t.Any],
    *args: P.args,
    **kwargs: P.kwargs,
) -> Benchmark:
    """Run a benchmark on the given function, similar to IPython's %timeit."""
    wrapped = partial(func, *args, **kwargs)
    timer = timeit.Timer(wrapped)

    repeat = REPEAT
    if NUMBER is None:
        number, _ = timer.autorange()
        number *= 2
    else:
        number = NUMBER

    results = timer.repeat(repeat=repeat, number=number)
    per_loop = [timing / number for timing in results]
    best = min(per_loop)
    avg = statistics.mean(per_loop)
    st_dev = statistics.stdev(per_loop) if len(per_loop) > 1 else 0.0

    print(
        format_time(avg),
        '±',
        format_time(st_dev),
        'per loop (mean ± std. dev. of',
        repeat,
        'runs',
        number,
        'loops each)',
    )

    return Benchmark(
        all_runs=results,
        best=best,
        average=avg,
        stdev=st_dev,
        loops=number,
        repeat=repeat,
    )


def format_time(
    dt: float,
    *,
    unit: t.Optional[str] = None,
    precision: int = 3,
) -> str:
    """Format time (copied from timeit lib)."""
    units = {  # This map needs to be sorted from larger to smaller
        's': 1.0,
        'ms': 1e-3,
        'us': 1e-6,
        'ns': 1e-9,
    }

    if unit:
        scale = units[unit]
    else:
        for unit, scale in units.items():  # noqa: B007 # pylint: disable=R1704
            if dt >= scale:
                break
        else:
            unit = 'ns'
            scale = units[unit]

    return f'{dt / scale:.{precision}g} {unit}'


def print_row(
    *,
    name: str,
    value: float,
    ok: bool,
    baseline: float,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print a comparison table row."""
    x_times = baseline / value
    if isclose(value, baseline):
        detail = ''
    elif isclose(x_times, 1.0, rel_tol=0.1):
        detail = '(close to baseline)'
    elif x_times < 1:
        detail = '(slower than baseline)'
    else:
        detail = '(faster than baseline)'

    print(
        name.ljust(columns_sizes[0]),
        '|',
        format_time(value).rjust(columns_sizes[1]),
        '|',
        ('√' if ok else '⚠').center(columns_sizes[2]),
        '|',
        'baseline' if isclose(value, baseline) else f'{x_times:4.2f}x'.rjust(4),
        detail,
    )


def is_timing_ok(timing: Benchmark, /) -> bool:
    """Return True if the timing measurement appears to be correct."""
    return (timing['best'] / timing['stdev']) > 60


def print_table(
    *,
    title: str,
    timings: dict[str, Benchmark],
    baseline: str,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print comparison table."""
    print()
    print(title.ljust(columns_sizes[0]), '| Best Abs Time | Measure | Comparison')
    print(
        '-' * columns_sizes[0],
        '|',
        '-' * columns_sizes[1],
        '|',
        '-' * columns_sizes[2],
        '|',
        '-' * columns_sizes[3],
    )
    baseline_value = timings[baseline]['best']
    for name, timing in timings.items():
        ok = is_timing_ok(timing)
        print_row(
            name=name,
            value=timing['best'],
            ok=ok,
            baseline=baseline_value,
            columns_sizes=columns_sizes,
        )
    print()


class PyBase64URLSafeEncoder(EncoderInterface):
    """Base64 URL-safe encoder using PyBase64."""

    @property
    def alphabet(self) -> bytes:
        """Return the encoder alphabet characters."""
        return b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-'

    def encode(self, data: bytes) -> bytes:
        """Encode given data to base64 URL-safe without padding.

        Args:
            data: Data to encode.

        Returns:
            Encoded data.
        """
        return b64encode(data, b'-_').rstrip(b'=')

    def decode(self, data: bytes) -> bytes:
        """Decode given encoded data from base64 URL-safe without padding.

        Args:
            data: Data to decode.

        Returns:
            Original data.
        """
        return b64decode(data + (b'=' * (len(data) % 4)), b'-_')


def main() -> None:
    """Run comparisons."""
    secret = b'secret' * 3

    # Example payload
    small_payload: dict[str, t.Any] = {
        'user': {
            'id': 1337,
        },
    }
    regular_payload: dict[str, t.Any] = {
        'user': {
            'id': 1337,
            'name': 'HacKan CuBa',
            'username': 'hackan_cuba',
            'email': 'hackan@email.com',
        },
        '_meta': {
            'iss': 'blake2signer',
            'nbf': datetime.now(timezone.utc).isoformat(),
        },
    }
    payloads: dict[str, dict[str, t.Any]] = {
        'small': small_payload,
        'regular': regular_payload,
        'large': {
            'payload': [regular_payload] * 100,  # This is just a representation
        },
    }
    results: dict[str, dict[str, Benchmark]] = {}

    print(
        TermFormat.CYAN,
        'Comparing encoders, please wait (this will take a while)...',
        TermFormat.END,
        sep='',
    )
    print()

    for payload, data in payloads.items():
        sentinel_payload = deepcopy(data)
        timings: dict[str, Benchmark] = {}
        print()
        print(
            TermFormat.GREEN,
            'Payload size ~: ',
            len(json.dumps(data).encode()),
            ' bytes (',
            payload,
            ')',
            TermFormat.END,
            sep='',
        )

        for encoder in (B64URLEncoder, HexEncoder, B32Encoder, B58Encoder, PyBase64URLSafeEncoder):
            serializer = Blake2SerializerSigner(secret, encoder=encoder)
            name = encoder.__name__

            print(name)
            timings[name] = benchmark(lambda s, d: s.loads(s.dumps(d)), serializer, data)
            assert data == sentinel_payload

        results[payload] = deepcopy(timings)

        print_table(
            title='Encoder',
            timings=timings,
            baseline=B64URLEncoder.__name__,
            columns_sizes=COLUMNS,
        )


if __name__ == '__main__':
    main()

Comparing encoders, please wait (this will take a while)...


Payload size ~: 22 bytes (small)
B64URLEncoder
101 us ± 1.79 us per loop (mean ± std. dev. of 10 runs 4000 loops each)
HexEncoder
99.9 us ± 1.11 us per loop (mean ± std. dev. of 10 runs 10000 loops each)
B32Encoder
132 us ± 3.14 us per loop (mean ± std. dev. of 10 runs 4000 loops each)
B58Encoder
135 us ± 3.02 us per loop (mean ± std. dev. of 10 runs 4000 loops each)
PyBase64URLSafeEncoder
102 us ± 3.49 us per loop (mean ± std. dev. of 10 runs 4000 loops each)

Encoder                   | Best Abs Time | Measure | Comparison
------------------------- | ------------- | ------- | -----------------------------
B64URLEncoder             |       99.9 us |    ⚠    | baseline
HexEncoder                |       98.1 us |    √    |  1.0x (close to baseline)
B32Encoder                |        127 us |    ⚠    | 0.79x (slower than baseline)
B58Encoder                |        131 us |    ⚠    | 0.77x (slower than baseline)
PyBase64URLSafeEncoder    |       98.8 us |    ⚠    |  1.0x (close to baseline)


Payload size ~: 178 bytes (regular)
B64URLEncoder
115 us ± 3.24 us per loop (mean ± std. dev. of 10 runs 4000 loops each)
HexEncoder
118 us ± 4.5 us per loop (mean ± std. dev. of 10 runs 4000 loops each)
B32Encoder
192 us ± 5.94 us per loop (mean ± std. dev. of 10 runs 2000 loops each)
B58Encoder
260 us ± 3.8 us per loop (mean ± std. dev. of 10 runs 2000 loops each)
PyBase64URLSafeEncoder
116 us ± 2.44 us per loop (mean ± std. dev. of 10 runs 4000 loops each)

Encoder                   | Best Abs Time | Measure | Comparison
------------------------- | ------------- | ------- | -----------------------------
B64URLEncoder             |        113 us |    ⚠    | baseline
HexEncoder                |        111 us |    ⚠    |  1.0x (close to baseline)
B32Encoder                |        185 us |    ⚠    | 0.61x (slower than baseline)
B58Encoder                |        255 us |    √    | 0.44x (slower than baseline)
PyBase64URLSafeEncoder    |        114 us |    ⚠    |  1.0x (close to baseline)


Payload size ~: 18013 bytes (large)
B64URLEncoder
543 us ± 7.39 us per loop (mean ± std. dev. of 10 runs 1000 loops each)
HexEncoder
536 us ± 10.3 us per loop (mean ± std. dev. of 10 runs 1000 loops each)
B32Encoder
647 us ± 6.3 us per loop (mean ± std. dev. of 10 runs 1000 loops each)
B58Encoder
804 us ± 12.7 us per loop (mean ± std. dev. of 10 runs 1000 loops each)
PyBase64URLSafeEncoder
536 us ± 6.75 us per loop (mean ± std. dev. of 10 runs 1000 loops each)

Encoder                   | Best Abs Time | Measure | Comparison
------------------------- | ------------- | ------- | -----------------------------
B64URLEncoder             |        532 us |    √    | baseline
HexEncoder                |        525 us |    ⚠    |  1.0x (close to baseline)
B32Encoder                |        640 us |    √    | 0.83x (slower than baseline)
B58Encoder                |        791 us |    √    | 0.67x (slower than baseline)
PyBase64URLSafeEncoder    |        530 us |    √    |  1.0x (close to baseline)

Note

The standard deviation presented on each evaluation should be around two orders of magnitude lower than the statistics mean for appropriate results. As a simple reference, if the mean is in ms, then the std dev should be in us.

Serializers¶

Some serializers perform better than others. The standard library only offers json, which this lib uses by default with the JSONSerializer, pickle, and marshal, but these last two are very dangerous ¹ ² ³ to use with this lib as it could receive attacker-controlled data to serialize. And JSON is not known to be very fast per se.

However, you can use orjson being the fastest Python JSON library, msgpack which is very efficient, much more so than JSON, or CBOR by implementing a custom serializer for them.

Additionally, if you are serializing known data, you should probably try schema-based serialization such as Cap’n Proto, FlatBuffers, or Protobuf, which are not included in the following generalized example.

Comparing Serializers

SourceOutput

"""Comparing Serializers.

This script helps compare the performance of different serializers.

Requirements:
    python3 -m pip install \
        'blake2signer' \
        'orjson~=3.11' \
        'msgpack~=1.1' \
        'cbor2~=5.7'

Usage:
    python3 comparing_serializers.py
    NUMBER=10 REPEAT=2 pypy3 comparing_serializers.py
"""

import json
import os
import statistics
import timeit
import typing as t
from collections.abc import Callable
from copy import deepcopy
from datetime import datetime
from datetime import timezone
from enum import Enum
from functools import partial
from math import isclose

import cbor2
import msgpack
import orjson

from blake2signer import Blake2SerializerSigner
from blake2signer.interfaces import SerializerInterface
from blake2signer.serializers import JSONSerializer

NUMBER: t.Optional[int] = int(os.getenv('NUMBER', '0')) if os.getenv('NUMBER') else None
"""Set this value if you prefer to have a fixed number of rounds for the benchmark."""

REPEAT: int = int(os.getenv('REPEAT', '10'))
"""How many times should each iterations cycle repeat."""

COLUMNS = (25, 13, 7, 29)
"""Sizes of each table columns."""


class Benchmark(t.TypedDict):
    """Benchmark result."""
    all_runs: list[float]
    best: float
    average: float
    stdev: float
    loops: int
    repeat: int


class TermFormat(str, Enum):
    """Terminal formatting."""

    PURPLE = '\033[95m'
    CYAN = '\033[96m'
    DARKCYAN = '\033[36m'
    BLUE = '\033[94m'
    GREEN = '\033[92m'
    YELLOW = '\033[93m'
    RED = '\033[91m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    END = '\033[0m'

    def __str__(self) -> str:
        """String representation."""
        return self.value


P = t.ParamSpec('P')


def benchmark(
    func: Callable[P, t.Any],
    *args: P.args,
    **kwargs: P.kwargs,
) -> Benchmark:
    """Run a benchmark on the given function, similar to IPython's %timeit."""
    wrapped = partial(func, *args, **kwargs)
    timer = timeit.Timer(wrapped)

    repeat = REPEAT
    if NUMBER is None:
        number, _ = timer.autorange()
        number *= 2
    else:
        number = NUMBER

    results = timer.repeat(repeat=repeat, number=number)
    per_loop = [timing / number for timing in results]
    best = min(per_loop)
    avg = statistics.mean(per_loop)
    st_dev = statistics.stdev(per_loop) if len(per_loop) > 1 else 0.0

    print(
        format_time(avg),
        '±',
        format_time(st_dev),
        'per loop (mean ± std. dev. of',
        repeat,
        'runs',
        number,
        'loops each)',
    )

    return Benchmark(
        all_runs=results,
        best=best,
        average=avg,
        stdev=st_dev,
        loops=number,
        repeat=repeat,
    )


def format_time(
    dt: float,
    *,
    unit: t.Optional[str] = None,
    precision: int = 3,
) -> str:
    """Format time (copied from timeit lib)."""
    units = {  # This map needs to be sorted from larger to smaller
        's': 1.0,
        'ms': 1e-3,
        'us': 1e-6,
        'ns': 1e-9,
    }

    if unit:
        scale = units[unit]
    else:
        for unit, scale in units.items():  # noqa: B007 # pylint: disable=R1704
            if dt >= scale:
                break
        else:
            unit = 'ns'
            scale = units[unit]

    return f'{dt / scale:.{precision}g} {unit}'


def print_row(
    *,
    name: str,
    value: float,
    ok: bool,
    baseline: float,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print a comparison table row."""
    x_times = baseline / value
    if isclose(value, baseline):
        detail = ''
    elif isclose(x_times, 1.0, rel_tol=0.1):
        detail = '(close to baseline)'
    elif x_times < 1:
        detail = '(slower than baseline)'
    else:
        detail = '(faster than baseline)'

    print(
        name.ljust(columns_sizes[0]),
        '|',
        format_time(value).rjust(columns_sizes[1]),
        '|',
        ('√' if ok else '⚠').center(columns_sizes[2]),
        '|',
        'baseline' if isclose(value, baseline) else f'{x_times:4.2f}x'.rjust(4),
        detail,
    )


def is_timing_ok(timing: Benchmark, /) -> bool:
    """Return True if the timing measurement appears to be correct."""
    return (timing['best'] / timing['stdev']) > 60


def print_table(
    *,
    title: str,
    timings: dict[str, Benchmark],
    baseline: str,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print comparison table."""
    print()
    print(title.ljust(columns_sizes[0]), '| Best Abs Time | Measure | Comparison')
    print(
        '-' * columns_sizes[0],
        '|',
        '-' * columns_sizes[1],
        '|',
        '-' * columns_sizes[2],
        '|',
        '-' * columns_sizes[3],
    )
    baseline_value = timings[baseline]['best']
    for name, timing in timings.items():
        ok = is_timing_ok(timing)
        print_row(
            name=name,
            value=timing['best'],
            ok=ok,
            baseline=baseline_value,
            columns_sizes=columns_sizes,
        )
    print()


class ORJSONSerializer(SerializerInterface):
    """ORJSON serializer."""

    def serialize(self, data: t.Any, **kwargs: t.Any) -> bytes:
        """Serialize given data as JSON."""
        return orjson.dumps(data, **kwargs)  # pylint: disable=E1101

    def unserialize(self, data: bytes, **__: t.Any) -> t.Any:
        """Unserialize given JSON data."""
        return orjson.loads(data)  # pylint: disable=E1101


class MsgpackSerializer(SerializerInterface):
    """Msgpack serializer."""

    def serialize(self, data: t.Any, **kwargs: t.Any) -> bytes:
        """Serialize given data as msgpack."""
        return msgpack.packb(data, **kwargs)

    def unserialize(self, data: bytes, **kwargs: t.Any) -> t.Any:
        """Unserialize given msgpack data."""
        return msgpack.unpackb(data, **kwargs)


class CBORSerializer(SerializerInterface):
    """CBOR serializer."""

    def serialize(self, data: t.Any, **kwargs: t.Any) -> bytes:
        """Serialize given data as CBOR."""
        return cbor2.dumps(data, **kwargs)

    def unserialize(self, data: bytes, **kwargs: t.Any) -> t.Any:
        """Unserialize given CBOR data."""
        return cbor2.loads(data, **kwargs)


def main() -> None:
    """Run comparisons."""
    secret = b'c2hhbWUgb24geW91IElzcmFlbCE'

    # Example payload
    small_payload: dict[str, t.Any] = {
        'user': {
            'id': 1337,
        },
    }
    regular_payload: dict[str, t.Any] = {
        'user': {
            'id': 1337,
            'name': 'HacKan CuBa',
            'username': 'hackan_cuba',
            'email': 'hackan@email.com',
        },
        '_meta': {
            'iss': 'blake2signer',
            'nbf': datetime.now(timezone.utc).isoformat(),
        },
    }
    payloads: dict[str, dict[str, t.Any]] = {
        'small': small_payload,
        'regular': regular_payload,
        'large': {
            'payload': [regular_payload] * 100,  # This is just a representation
        },
    }
    results: dict[str, dict[str, Benchmark]] = {}

    print(
        TermFormat.CYAN,
        'Comparing serializers, please wait (this will take a while)...',
        TermFormat.END,
        sep='',
    )
    print()

    for payload, data in payloads.items():
        sentinel_payload = deepcopy(data)
        timings: dict[str, Benchmark] = {}
        print()
        print(
            TermFormat.GREEN,
            'Payload size ~: ',
            len(json.dumps(data).encode()),
            ' bytes (',
            payload,
            ')',
            TermFormat.END,
            sep='',
        )

        for serializer in (JSONSerializer, ORJSONSerializer, MsgpackSerializer, CBORSerializer):
            signer = Blake2SerializerSigner(secret, serializer=serializer)
            name = serializer.__name__

            print(name)
            timings[name] = benchmark(lambda s, d: s.loads(s.dumps(d)), signer, data)
            assert data == sentinel_payload

        results[payload] = deepcopy(timings)

        print_table(
            title='Serializer',
            timings=timings,
            baseline=JSONSerializer.__name__,
            columns_sizes=COLUMNS,
        )


if __name__ == '__main__':
    main()

Comparing serializers, please wait (this will take a while)...


Payload size ~: 22 bytes (small)
JSONSerializer
57.9 us ± 698 ns per loop (mean ± std. dev. of 10 runs 10000 loops each)
ORJSONSerializer
42.4 us ± 1.31 us per loop (mean ± std. dev. of 10 runs 10000 loops each)
MsgpackSerializer
23.9 us ± 349 ns per loop (mean ± std. dev. of 10 runs 20000 loops each)
CBORSerializer
35.1 us ± 353 ns per loop (mean ± std. dev. of 10 runs 20000 loops each)

Serializer                | Best Abs Time | Measure | Comparison
------------------------- | ------------- | ------- | -----------------------------
JSONSerializer            |       56.9 us |    √    | baseline
ORJSONSerializer          |       40.8 us |    ⚠    | 1.40x (faster than baseline)
MsgpackSerializer         |       23.4 us |    √    | 2.43x (faster than baseline)
CBORSerializer            |       34.3 us |    √    | 1.66x (faster than baseline)


Payload size ~: 178 bytes (regular)
JSONSerializer
51.1 us ± 3.65 us per loop (mean ± std. dev. of 10 runs 10000 loops each)
ORJSONSerializer
33.5 us ± 1.52 us per loop (mean ± std. dev. of 10 runs 20000 loops each)
MsgpackSerializer
30.5 us ± 658 ns per loop (mean ± std. dev. of 10 runs 20000 loops each)
CBORSerializer
50.6 us ± 1.11 us per loop (mean ± std. dev. of 10 runs 10000 loops each)

Serializer                | Best Abs Time | Measure | Comparison
------------------------- | ------------- | ------- | -----------------------------
JSONSerializer            |       49.3 us |    ⚠    | baseline
ORJSONSerializer          |       32.4 us |    ⚠    | 1.52x (faster than baseline)
MsgpackSerializer         |         30 us |    ⚠    | 1.64x (faster than baseline)
CBORSerializer            |       49.7 us |    ⚠    | 0.99x (close to baseline)


Payload size ~: 18013 bytes (large)
JSONSerializer
462 us ± 1.98 us per loop (mean ± std. dev. of 10 runs 1000 loops each)
ORJSONSerializer
201 us ± 12.4 us per loop (mean ± std. dev. of 10 runs 4000 loops each)
MsgpackSerializer
301 us ± 6.92 us per loop (mean ± std. dev. of 10 runs 2000 loops each)
CBORSerializer
664 us ± 4.32 us per loop (mean ± std. dev. of 10 runs 1000 loops each)

Serializer                | Best Abs Time | Measure | Comparison
------------------------- | ------------- | ------- | -----------------------------
JSONSerializer            |        459 us |    √    | baseline
ORJSONSerializer          |        194 us |    ⚠    | 2.36x (faster than baseline)
MsgpackSerializer         |        295 us |    ⚠    | 1.56x (faster than baseline)
CBORSerializer            |        658 us |    √    | 0.70x (slower than baseline)

Note

The standard deviation presented on each evaluation should be around two orders of magnitude lower than the statistics mean for appropriate results. As a simple reference, if the mean is in ms, then the std dev should be in us.

Compressors¶

Some compressors perform better than others. The standard library offers several alternatives that depend on how your CPython was compiled. This lib only considers zlib, which is used by default with the ZlibCompressor, and gzip, as they are both widely available by default.

Other options worth mentioning are zstd, which is coming to the standard library in v3.14, LZ4, and Brotli. From those, both zstd and lz4 shine on general use cases, but as with everything else YMMV, as you need to determine whether you require high compression or some compression but higher speed.
These can be implemented through a custom compressor for each of them.

The following example only measures how fast is the compression at the default level, not how much it compresses.

Comparing Compressors

SourceOutput

"""Comparing Serializers.

This script helps compare the performance of different compressors.

Requirements:
    python3 -m pip install \
        'blake2signer' \
        'zstandard~=0.24' \
        'lz4~=4.4' \

Usage:
    python3 comparing_compressors.py
    NUMBER=10 REPEAT=2 pypy3 comparing_compressors.py
"""

import bz2
import json
import os
import statistics
import timeit
import typing as t
from collections.abc import Callable
from copy import deepcopy
from datetime import datetime
from datetime import timezone
from enum import Enum
from functools import partial
from math import isclose

import lz4.frame as lz4
import zstandard as zstd

from blake2signer import Blake2SerializerSigner
from blake2signer.compressors import GzipCompressor
from blake2signer.compressors import ZlibCompressor
from blake2signer.interfaces import CompressorInterface

NUMBER: t.Optional[int] = int(os.getenv('NUMBER', '0')) if os.getenv('NUMBER') else None
"""Set this value if you prefer to have a fixed number of rounds for the benchmark."""

REPEAT: int = int(os.getenv('REPEAT', '10'))
"""How many times should each iterations cycle repeat."""

COLUMNS = (25, 13, 7, 29)
"""Sizes of each table columns."""


class Benchmark(t.TypedDict):
    """Benchmark result."""
    all_runs: list[float]
    best: float
    average: float
    stdev: float
    loops: int
    repeat: int


class TermFormat(str, Enum):
    """Terminal formatting."""

    PURPLE = '\033[95m'
    CYAN = '\033[96m'
    DARKCYAN = '\033[36m'
    BLUE = '\033[94m'
    GREEN = '\033[92m'
    YELLOW = '\033[93m'
    RED = '\033[91m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    END = '\033[0m'

    def __str__(self) -> str:
        """String representation."""
        return self.value


P = t.ParamSpec('P')


def benchmark(
    func: Callable[P, t.Any],
    *args: P.args,
    **kwargs: P.kwargs,
) -> Benchmark:
    """Run a benchmark on the given function, similar to IPython's %timeit."""
    wrapped = partial(func, *args, **kwargs)
    timer = timeit.Timer(wrapped)

    repeat = REPEAT
    if NUMBER is None:
        number, _ = timer.autorange()
        number *= 2
    else:
        number = NUMBER

    results = timer.repeat(repeat=repeat, number=number)
    per_loop = [timing / number for timing in results]
    best = min(per_loop)
    avg = statistics.mean(per_loop)
    st_dev = statistics.stdev(per_loop) if len(per_loop) > 1 else 0.0

    print(
        format_time(avg),
        '±',
        format_time(st_dev),
        'per loop (mean ± std. dev. of',
        repeat,
        'runs',
        number,
        'loops each)',
    )

    return Benchmark(
        all_runs=results,
        best=best,
        average=avg,
        stdev=st_dev,
        loops=number,
        repeat=repeat,
    )


def format_time(
    dt: float,
    *,
    unit: t.Optional[str] = None,
    precision: int = 3,
) -> str:
    """Format time (copied from timeit lib)."""
    units = {  # This map needs to be sorted from larger to smaller
        's': 1.0,
        'ms': 1e-3,
        'us': 1e-6,
        'ns': 1e-9,
    }

    if unit:
        scale = units[unit]
    else:
        for unit, scale in units.items():  # noqa: B007 # pylint: disable=R1704
            if dt >= scale:
                break
        else:
            unit = 'ns'
            scale = units[unit]

    return f'{dt / scale:.{precision}g} {unit}'


def print_row(
    *,
    name: str,
    value: float,
    ok: bool,
    baseline: float,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print a comparison table row."""
    x_times = baseline / value
    if isclose(value, baseline):
        detail = ''
    elif isclose(x_times, 1.0, rel_tol=0.1):
        detail = '(close to baseline)'
    elif x_times < 1:
        detail = '(slower than baseline)'
    else:
        detail = '(faster than baseline)'

    print(
        name.ljust(columns_sizes[0]),
        '|',
        format_time(value).rjust(columns_sizes[1]),
        '|',
        ('√' if ok else '⚠').center(columns_sizes[2]),
        '|',
        'baseline' if isclose(value, baseline) else f'{x_times:4.2f}x'.rjust(4),
        detail,
    )


def is_timing_ok(timing: Benchmark, /) -> bool:
    """Return True if the timing measurement appears to be correct."""
    return (timing['best'] / timing['stdev']) > 60


def print_table(
    *,
    title: str,
    timings: dict[str, Benchmark],
    baseline: str,
    columns_sizes: tuple[int, int, int, int],
) -> None:
    """Print comparison table."""
    print()
    print(title.ljust(columns_sizes[0]), '| Best Abs Time | Measure | Comparison')
    print(
        '-' * columns_sizes[0],
        '|',
        '-' * columns_sizes[1],
        '|',
        '-' * columns_sizes[2],
        '|',
        '-' * columns_sizes[3],
    )
    baseline_value = timings[baseline]['best']
    for name, timing in timings.items():
        ok = is_timing_ok(timing)
        print_row(
            name=name,
            value=timing['best'],
            ok=ok,
            baseline=baseline_value,
            columns_sizes=columns_sizes,
        )
    print()


class Bz2Compressor(CompressorInterface):
    """Bzip2 compressor."""

    @property
    def default_compression_level(self) -> int:
        """Get the default compression level."""
        # According to https://docs.python.org/3/library/bz2.html#bz2.compress
        return 9

    def compress(self, data: bytes, *, level: int) -> bytes:
        """Compress given data."""
        return bz2.compress(data, compresslevel=level)

    def decompress(self, data: bytes) -> bytes:
        """Decompress given compressed data."""
        return bz2.decompress(data)


class ZstdCompressor(CompressorInterface):
    """Zstd compressor."""

    @property
    def default_compression_level(self) -> int:
        """Get the default compression level."""
        # According to https://python-zstandard.readthedocs.io/en/latest/one_shot_api.html
        return 3

    def compress(self, data: bytes, *, level: int) -> bytes:
        """Compress given data."""
        return zstd.compress(data, level)

    def decompress(self, data: bytes) -> bytes:
        """Decompress given compressed data."""
        return zstd.decompress(data)


class LZ4Compressor(CompressorInterface):
    """LZ4 compressor."""

    @property
    def default_compression_level(self) -> int:
        """Get the default compression level."""
        # According to
        # https://python-lz4.readthedocs.io/en/stable/quickstart.html#controlling-the-compression
        return lz4.COMPRESSIONLEVEL_MIN

    def compress(self, data: bytes, *, level: int) -> bytes:
        """Compress given data."""
        return lz4.compress(data, level)

    def decompress(self, data: bytes) -> bytes:
        """Decompress given compressed data."""
        return lz4.decompress(data)


def main() -> None:
    """Run comparisons."""
    secret = b'c2hhbWUgb24geW91IElzcmFlbCE'

    # Example payload
    small_payload: dict[str, t.Any] = {
        'user': {
            'id': 1337,
        },
    }
    regular_payload: dict[str, t.Any] = {
        'user': {
            'id': 1337,
            'name': 'HacKan CuBa',
            'username': 'hackan_cuba',
            'email': 'hackan@email.com',
        },
        '_meta': {
            'iss': 'blake2signer',
            'nbf': datetime.now(timezone.utc).isoformat(),
        },
    }
    payloads: dict[str, dict[str, t.Any]] = {
        'small': small_payload,
        'regular': regular_payload,
        'large': {
            'payload': [regular_payload] * 100,  # This is just a representation
        },
    }
    results: dict[str, dict[str, Benchmark]] = {}

    print(
        TermFormat.CYAN,
        'Comparing compressors, please wait (this will take a while)...',
        TermFormat.END,
        sep='',
    )
    print()

    for payload, data in payloads.items():
        sentinel_payload = deepcopy(data)
        timings: dict[str, Benchmark] = {}
        print()
        print(
            TermFormat.GREEN,
            'Payload size ~: ',
            len(json.dumps(data).encode()),
            ' bytes (',
            payload,
            ')',
            TermFormat.END,
            sep='',
        )

        for compressor in (
                ZlibCompressor,
                GzipCompressor,
                ZstdCompressor,
                LZ4Compressor,
                Bz2Compressor,
        ):
            signer = Blake2SerializerSigner(secret, compressor=compressor)
            name = compressor.__name__

            print(name)
            timings[name] = benchmark(lambda s, d: s.loads(s.dumps(d)), signer, data)
            assert data == sentinel_payload

        results[payload] = deepcopy(timings)

        print_table(
            title='Compressor',
            timings=timings,
            baseline=ZlibCompressor.__name__,
            columns_sizes=COLUMNS,
        )


if __name__ == '__main__':
    main()

Comparing compressors, please wait (this will take a while)...


Payload size ~: 22 bytes (small)
ZlibCompressor
57.4 us ± 1.37 us per loop (mean ± std. dev. of 10 runs 10000 loops each)
GzipCompressor
59.9 us ± 404 ns per loop (mean ± std. dev. of 10 runs 10000 loops each)
ZstdCompressor
33.3 us ± 173 ns per loop (mean ± std. dev. of 10 runs 20000 loops each)
LZ4Compressor
32.2 us ± 2.11 us per loop (mean ± std. dev. of 10 runs 20000 loops each)
Bz2Compressor
71.9 us ± 1.15 us per loop (mean ± std. dev. of 10 runs 10000 loops each)

Compressor                | Best Abs Time | Measure | Comparison
------------------------- | ------------- | ------- | -----------------------------
ZlibCompressor            |         56 us |    ⚠    | baseline 
GzipCompressor            |       59.3 us |    √    | 0.94x (close to baseline)
ZstdCompressor            |         33 us |    √    | 1.70x (faster than baseline)
LZ4Compressor             |       30.6 us |    ⚠    | 1.83x (faster than baseline)
Bz2Compressor             |       70.8 us |    √    | 0.79x (slower than baseline)


Payload size ~: 178 bytes (regular)
ZlibCompressor
49.6 us ± 965 ns per loop (mean ± std. dev. of 10 runs 10000 loops each)
GzipCompressor
61 us ± 723 ns per loop (mean ± std. dev. of 10 runs 10000 loops each)
ZstdCompressor
49.8 us ± 217 ns per loop (mean ± std. dev. of 10 runs 10000 loops each)
LZ4Compressor
37.1 us ± 2.11 us per loop (mean ± std. dev. of 10 runs 20000 loops each)
Bz2Compressor
96 us ± 2.21 us per loop (mean ± std. dev. of 10 runs 10000 loops each)

Compressor                | Best Abs Time | Measure | Comparison
------------------------- | ------------- | ------- | -----------------------------
ZlibCompressor            |       48.7 us |    ⚠    | baseline 
GzipCompressor            |       60.2 us |    √    | 0.81x (slower than baseline)
ZstdCompressor            |       49.4 us |    √    | 0.98x (close to baseline)
LZ4Compressor             |       36.2 us |    ⚠    | 1.35x (faster than baseline)
Bz2Compressor             |       94.2 us |    ⚠    | 0.52x (slower than baseline)


Payload size ~: 18013 bytes (large)
ZlibCompressor
455 us ± 2.82 us per loop (mean ± std. dev. of 10 runs 1000 loops each)
GzipCompressor
472 us ± 8.6 us per loop (mean ± std. dev. of 10 runs 1000 loops each)
ZstdCompressor
401 us ± 23.2 us per loop (mean ± std. dev. of 10 runs 2000 loops each)
LZ4Compressor
379 us ± 6.83 us per loop (mean ± std. dev. of 10 runs 2000 loops each)
Bz2Compressor
3.79 ms ± 58 us per loop (mean ± std. dev. of 10 runs 200 loops each)

Compressor                | Best Abs Time | Measure | Comparison
------------------------- | ------------- | ------- | -----------------------------
ZlibCompressor            |        451 us |    √    | baseline 
GzipCompressor            |        464 us |    ⚠    | 0.97x (close to baseline)
ZstdCompressor            |        391 us |    ⚠    | 1.15x (faster than baseline)
LZ4Compressor             |        375 us |    ⚠    | 1.20x (faster than baseline)
Bz2Compressor             |       3.74 ms |    √    | 0.12x (slower than baseline)

Note

The standard deviation presented on each evaluation should be around two orders of magnitude lower than the statistics mean for appropriate results. As a simple reference, if the mean is in ms, then the std dev should be in us.