# Count-min sketch

## What it is

A count-min sketch is a probabilistic counter of hashable objects. Similar to bloom filters in structure, they exhibit false positives (may overcount), but no false negatives (never undercounts). Count-min sketches have two parameters: Length and width. Its accuracy is described by two numbers: ϵ describing its error and δ describing its probability of error. More rigorously, if N is the true count of an event, E is the estimate given by a sketch and T the total count of items in the sketch, E ≤ N + Tϵ with probability (1 - δ). The parameters of a sketch can be determined by `length = 2/ϵ`

and `depth = -log(δ)/log(2)`

.

## Adding (`add!`

)

Count-min sketches has infinite capacity, but adding elements steadily increases the probability of a miscounting other values. Add time is proportional to `depth`

.

## Querying (`getindex`

)

Querying time is proportional to `depth`

. It has a certain risk of reporting too high counts, but no risk of undercounting.

## Subtracting/deletion.

Count-min sketches do not support subtracting or deleting.

## Usage example

I don't know anything about astronomy, but indulge me: Say I want to count how many times different stars have been observed. Sightings are available in a public database. There's about 1.4 billion known stars, with about 5 billion sightings. I have 10 GB of available RAM to balance `ϵ`

and `δ`

.

First, I decide that a UInt8 (maximum count of 255) is fine - I don't care that Polaris have been observed 100,000 times.

Most stars have a low count, so I really care about miscounts not being too large, and I care less about them being frequent or infrequent, so I should minimize `ϵ`

rahter than `δ`

, which means maximizing length rather than depth. So if I pick width = 4 and length = 2.5e9. This means the probability of miscounting with `4*2/10e4 * 5e9 = 4`

with a probability of `exp(-4*log(2)) 6.25%`

:

```
sketch = CountMinSketch(2.5e9, 4)
for star in catalogue
push!(sketch, star) # same as adding one using add!
end
# How many times have a particular star observed?
println(sketch["HD 143183"])
```

## Interface

### Construction

At the moment, count-min sketches can *only* be constructed from the parameters `length`

and `width`

:

`sketch = CountMinSketch{UInt8}(10000, 5)`

Although the default eltype of count-min sketches are `UInt8`

, so you can avoid specifying that:

`sketch = CountMinSketch(10000, 5)`

### Central functions

`Base.getindex`

— Method`getindex(sketch::CountMinSketch, item)`

Get the estimated count of `item`

. This is never underestimated, but may be overestimated.

`Probably.add!`

— Method`add!(sketch::CountMinSketch, val, count)`

Add `count`

number of `val`

to the sketch. For increased speed, let `count`

be of the same type as `eltype(sketch)`

.

**Examples**

```
julia> sketch = CountMinSketch(1 << 24, 4);
julia> add!(sketch, "hello", eltype(sketch)(5));
julia> sketch["hello"]
5
```

`Base.push!`

— Method`push!(sketch::CountMinSketch, val)`

Add `val`

to the sketch once.

**Examples**

```
julia> sketch = CountMinSketch(1 << 24, 4);
julia> push!(sketch, "hello");
julia> sketch["hello"]
1
```

### Misc functions

Count-min sketches support the following operations, which have no specific docstring because they behave as stated in the documentation in Base:

```
Base.eltype
Base.copy!
Base.copy
Base.sizeof # This one includes the underlying array
```

`Base.:+`

— Method`+(x::CountMinSketch, y::CountMinSketch)`

Add two count-min sketches together. Will not work if `x`

and `y`

do not share parameters `T`

, `length`

and `width`

. The result will be a sketch with the summed counts of the two input sketches.

**Examples**

```
julia> x, y = CountMinSketch(1000, 4), CountMinSketch(1000, 4);
julia> add!(x, "hello", 4); add!(y, "hello", 19);
julia> z = x + y; Int(z["hello"])
23
```

`Probably.fprof`

— Method`fprof(sketch::CountMinSketch)`

Estimate the probability of miscounting an element in the sketch.

`Base.haskey`

— Method`haskey(sketch::CountMinSketch)`

Check if sketch[val] > 0.

`Base.empty!`

— Method`empty!(sketch::CountMinSketch)`

Reset counts of all items to zero, returning the sketch to initial state.

`Base.isempty`

— Method`isempty(sketch::CountMinSketch)`

Check if no items have been added to the sketch.