resq_dsa.count_min
Count-Min Sketch probabilistic data structure. This module provides a Count-Min Sketch implementation for frequency estimation of elements in a data stream. Useful for top-k queries and heavy hitter detection with sub-linear space.math
CountMinSketch Objects
_w- Number of columns in the sketch (width)._d- Number of rows in the sketch (depth)._table- The hash table storage.
sketch = CountMinSketch(epsilon=0.1, delta=0.01) sketch.increment(“item1”) sketch.increment(“item1”) sketch.increment(“item2”) sketch.estimate(“item1”) # Returns at least 2 2 sketch.estimate(“item2”) # Returns at least 1 1
CountMinSketch.__init__
epsilon- Error parameter. The error in estimation is at most epsilon with probability delta. Must be in (0, 1).delta- Confidence parameter. Must be in (0, 1).
ValueError- If epsilon or delta are not in (0, 1).
sketch = CountMinSketch(epsilon=0.1, delta=0.01)
CountMinSketch.increment
key- The key to increment.count- Amount to increment by (default: 1).
sketch = CountMinSketch(epsilon=0.1, delta=0.01) sketch.increment(“error”) sketch.increment(“error”, 5)
CountMinSketch.estimate
key- The key to estimate.
sketch = CountMinSketch(epsilon=0.1, delta=0.01) sketch.increment(“event”) sketch.estimate(“event”) 1