Questions for BitString, Binary, Charlist, and String in Elixir — Part 2: Binary (or bytes)

This is the second group of questions of the serial:

part 1. BitString (or bits)
part 2. Binary (or bytes)
part 3. String and Charlist

Here are some questions on Binary in Elixir to check how we know it.

Q: What is a binary?
A: A sequence of bytes.

Q: Is all String in Elixir also a binary?
A: Yes.

Q: Is there any binary not being a string?
A: Yes, if it contains any non-UTF8-encoded Unicode character or any character not in Unicode standard.

For instance, <<0x80>> is not a valid string but a valid binary.

Q: Is all binary in Elixir also a BitString?
A: Yes.

A bitstring is a sequence of zero or more bits, where the number of bits does not need to be divisible by 8. If the number of bits is divisible by 8, the bitstring is also a binary.

via http://erlang.org/doc/programming_examples/bit_syntax.html

Bytes are always bits but not vice versa.

The relationship between BitString, Binary, and String (in Elixir) is:

Q: What’re the differences between <<..>> and "…"?
A: There are no differences. The double quote mark " just tells the binary is printable.

Q: Is <<>> a binary?
A: Yes. So is its equivalent "".

Q: Is <<1>> a binary?
A: Yes. So are its equivalents <<1::8>> and <<1::size(8)>> and <<1::integer-size(8>>.

Q: Is <<1234>> a binary?
A: Yes. Actually it is <<210>> because 1234 overflows one byte, hence the last 8 bits of it are kept, which is Bitwise.band(1234, 0xFF).

Q: Is <<4::2, 128>> a binary?
A: No.

Q: Is <<4::2, 128::22>> a binary?
A: Yes.

Q: Can binaries reference to another to reduce memory usage?
A: Yes.

For example:

We can see the byte size of rest is 997, but it is referring to a 1000 bytes binary where it comes from.

This kind of binary is called sub-binary. In some cases, where the compiler knows the content of the binary is not going to be used, it can be optimized into a reusable match context, which only keeps pointers to the positions in the binary and hence is even more efficient both in memory and CPU.

Q: Any example?
A: Sure. Let’s compare the performance of sub-binary and match context:

MyBinParser1 is about 2.6x slower (!) because it creates a sub-binary every time.

Q: How do I avoid writing codes creating sub-binaries like MyBinParse1 above?
A: You can add a system environment variable ERL_COMPILER_OPTIONS=bin_opt_info to get warnings.

For example:
ERL_COMPILER_OPTIONS=bin_opt_info mix compile --force

Q: Do I need to care about big binaries?
A: In most cases, no. Binaries up to 64 bytes are stored in memory along with the process, named heap binary. they are small and efficient.

Longer binaries are stored in shared memory spaces, with references pointing to them from processes, called reference-counted binaries.

Keep in mind that never create binaries or sub-binaries whose content you’re not interested in.

Functional web programmer