Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Back to Home

Bit Vectors

We define 3 operations on bitvectors

  • Access(B, i): returns the element at pos in
  • Rank(B, i): returns the number of 1's in the range
  • Select(B, x): returns the position in which the rank becomes x, or if .

The book covers both the zero-order and k-th order compression of

Zero-order Compression

Given a bit vector , we divide it into blocks of size , s.t . Every block is assigned a class in which the class refers to the number of in .

Observe that a class can represent blocks.

Example

Let then the class represents the block . While represents the blocks .

Notice that just the variable is not enough to descibe , thus an offset variable is used to identify which of the possible blocks in referes to .

Assigning Offsets

Given that we have starting combinations to encode offsets for, observe that:

  • There are permutations of blocks that start with 0.

  • And permutations of blocks that start with 1.

Using this, the book describes an algorithm to assign to block of class

Algorithm 1

Encoding offsets:

  • Set

  • For , if:

    • , then set ; That is do nothing and skip over the 0's

    • , then set and set .

Note when x<y=0

Example - encoding

= 1011 So starting with

  • ; so ; and set

  • ; so ; and set

  • ; so ; and set

  • ; so ; and set

So is represented by the pair

Algorithm 2

Decoding offsets:

The operations is the inverse of encoding, so given :

  • if then is a 0. And we set

  • else and we set then

Example - decoding

Given

  • ; So ---->
  • ; So ---->
  • ; So ---->
  • ; So

Finally if we can set the remaining bits to 0, and is the exit condition.

Structures:

The compressed version of a bitvector includes 3 structures:

Class array

A array holding the elements of size

So:

We then use the taylor series approximation of To get:

Some Big O abuse allows us to ignore the ceil since it adds a factor. and to ignore the since it is strictly less that for any . Thus:

Offset array

Similar to the class array, we have elements, each of size .

That is the total size:

We know that removing the ceiling adds a factor, thus over the sum we get.

Lemma 1

We know that so we can remove the log from the sum by replacing the summation of with

Using the above lemma 2 we get that:

Lemma 2

Given some values the book states that

Proof (non-formal)

Observe that given a bitvector of size , any combination will also appear in a bitvector of size . If we treat as seperate vectors placed beside each other then naturally the left bits will have exactlt the same number of combinations and the right side can hold exactly the same number of combinations w.l.o.g of which side of the vector is used for and .

It follows then that the vector can represent , if we remove the restriction of having it be cut in the middle and instead be used as a continuoes block then the vector can represent MORE than the product of what each of it's divided sides can. Thus

Im prob not doing a good job of exmplaining it, but i get it, trust.

Now, let to be the total number of in (so its the ) and to total number of blocks. And using Lemma 2, we have it that:

FINALLY, recall from the entropy chapter that the zero-order (aka worst case) entropy of a universe is . so the size of the offset vector is bounded by ( is our universe in this case):

Lookup Table

We also need a lookup table for all the combinatoric values for which comes up to (I remember the C++ creator talking about triagular matrices, so a upper bound is not really tight but its big O so who cares ig)

Bringing the structures together

We have with a combines size of

Which with more Big O magic gives the bound: