Bit Vectors
We define 3 operations on bitvectors
- Access(B, i): returns the element at pos in
- Rank(B, i): returns the number of 1's in the range
- Select(B, x): returns the position in which the rank becomes x, or if .
The book covers both the zero-order and k-th order compression of
Zero-order Compression
Given a bit vector , we divide it into blocks of size , s.t . Every block is assigned a class in which the class refers to the number of in .
Observe that a class can represent blocks.
Example
Let then the class represents the block . While represents the blocks .
Notice that just the variable is not enough to descibe , thus an offset variable is used to identify which of the possible blocks in referes to .
Assigning Offsets
Given that we have starting combinations to encode offsets for, observe that:
-
There are permutations of blocks that start with 0.
-
And permutations of blocks that start with 1.
Using this, the book describes an algorithm to assign to block of class
Algorithm 1
Encoding offsets:
Set
For , if:
, then set ; That is do nothing and skip over the 0's
, then set and set .
Note when x<y=0
Example - encoding
= 1011 So starting with
; so ; and set
; so ; and set
; so ; and set
; so ; and set
So is represented by the pair
Algorithm 2
Decoding offsets:
The operations is the inverse of encoding, so given :
if then is a 0. And we set
else and we set then
Example - decoding
Given
- ; So ---->
- ; So ---->
- ; So ---->
- ; So
Finally if we can set the remaining bits to 0, and is the exit condition.
Structures:
The compressed version of a bitvector includes 3 structures:
Class array
A array holding the elements of size
So:
We then use the taylor series approximation of To get:
Some Big O abuse allows us to ignore the ceil since it adds a factor. and to ignore the since it is strictly less that for any . Thus:
Offset array
Similar to the class array, we have elements, each of size .
That is the total size:
We know that removing the ceiling adds a factor, thus over the sum we get.
Lemma 1
We know that so we can remove the log from the sum by replacing the summation of with
Using the above lemma 2 we get that:
Lemma 2
Given some values the book states that
Proof (non-formal)
Observe that given a bitvector of size , any combination will also appear in a bitvector of size . If we treat as seperate vectors placed beside each other then naturally the left bits will have exactlt the same number of combinations and the right side can hold exactly the same number of combinations w.l.o.g of which side of the vector is used for and .
It follows then that the vector can represent , if we remove the restriction of having it be cut in the middle and instead be used as a continuoes block then the vector can represent MORE than the product of what each of it's divided sides can. Thus
Im prob not doing a good job of exmplaining it, but i get it, trust.
Now, let to be the total number of in (so its the ) and to total number of blocks. And using Lemma 2, we have it that:
FINALLY, recall from the entropy chapter that the zero-order (aka worst case) entropy of a universe is . so the size of the offset vector is bounded by ( is our universe in this case):
Lookup Table
We also need a lookup table for all the combinatoric values for which comes up to (I remember the C++ creator talking about triagular matrices, so a upper bound is not really tight but its big O so who cares ig)
Bringing the structures together
We have with a combines size of
Which with more Big O magic gives the bound: