Binary Indexed Trees: A Faster Data Structure for Algorithms, Study Guides, Projects, Research of Data Structures and Algorithms

An overview of binary indexed trees (bit), a data structure used to make algorithms faster. Bit has a worst-time complexity of o(m log n), but it is easier to code and requires less memory space than other data structures like rmq. Notation, basic idea, isolating the last digit, reading cumulative frequency, changing frequency at some position and updating the tree, and reading the actual frequency at a position. It also includes a sample problem and conclusion.

Typology: Study Guides, Projects, Research

2016/2017

Uploaded on 05/26/2017

2093737
2093737 🇮🇳

9 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
12/24/2015 BinaryIndexedTrees–topcoder
https://www.topcoder.com/community/datascience/datasciencetutorials/binaryindexedtrees/ 1/17
JOIN LOG IN
COMPETE
DESIGN CHALLENGES
DEVELOPMENT CHALLENGES
DATA SCIENCE CHALLENGES
COMPETITIVE PROGRAMMING
LEARN
GETTING STARTED
DESIGN
DEVELOPMENT
DATA SCIENCE
COMPETITIVE PROGRAMMING
COMMUNITY
OVERVIEW
PROGRAMS
FORUMS
STATISTICS
MENU JOIN
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Binary Indexed Trees: A Faster Data Structure for Algorithms and more Study Guides, Projects, Research Data Structures and Algorithms in PDF only on Docsity!

JOIN LOG IN

COMPETE

DESIGN CHALLENGES

DEVELOPMENT CHALLENGES

DATA SCIENCE CHALLENGES

COMPETITIVE PROGRAMMING

LEARN

GETTING STARTED

DESIGN

DEVELOPMENT

DATA SCIENCE

COMPETITIVE PROGRAMMING

COMMUNITY

OVERVIEW

PROGRAMS

FORUMS

STATISTICS

MENU JOIN

EVENTS

BLOG

Binary Indexed Trees

By boba5551– TopCoder Member Discuss this article in the forums

Introduction Notation Basic idea Isolating the last digit Read cumulative frequency Change frequency at some position and update tree Read the actual frequency at a position Scaling the entire tree by a constant factor Find index with given cumulative frequency 2D BIT Sample problem Conclusion References

Introduction We often need some sort of data structure to make our algorithms faster. In this article we will discuss the Binary Indexed Trees structure. According to Peter M. Fenwick, this structure was first used for data compression. Now it is often used for storing frequencies and manipulating cumulative frequency tables.

Let’s define the following problem: We have n boxes. Possible queries are

  1. add marble to box i
  2. sum marbles from box k to box l

The naive solution has time complexity of O(1) for query 1 and O(n) for query 2. Suppose we make m queries. The worst case (when all queries are 2) has time complexity O(n * m). Using some data structure (i.e. RMQ) we can solve this problem with the worst case time complexity of O(m log n). Another approach is to use Binary Indexed Tree data structure, also with the worst time complexity O(m log n) — but Binary Indexed Trees are much easier to code, and require less memory space, than RMQ.

Notation

Image 1.3 – tree of responsibility for indexes (bar shows range of frequencies accumulated in top element)

Suppose we are looking for cumulative frequency of index 13 (for the first 13 elements). In binary notation, 13 is equal to 1101. Accordingly, we will calculate c[1101] = tree[1101] + tree[1100] + tree[1000] (more about this later).

Isolating the last digit NOTE: Instead of “the last non-zero digit,” it will write only “the last digit.”

There are times when we need to get just the last digit from a binary number, so we need an efficient way to do that. Let num be the integer whose last digit we want to isolate. In binary notation num can be represented as a1b, where a represents binary digits before the last digit and b represents zeroes after the last digit.

Integer -num is equal to (a1b)¯ + 1 = a¯0b¯ + 1. b consists of all zeroes, so b¯ consists of all ones. Finally we have

-num = (a1b)¯ + 1 = a¯0b¯ + 1 = a¯0(0…0)¯ + 1 = a¯0(1…1) + 1 = a¯1(0…0) = a¯1b. Now, we can easily isolate the last digit, using bitwise operator AND (in C++, Java it is &) with num and -num:

Image 1.4 – tree with tree frequencies

So, our result is 26. The number of iterations in this function is number if bits in idx, which is at most log MaxVal.

Time complexity: O(log MaxVal). Code length: Up to ten lines.

Change frequency at some position and update tree The concept is to update tree frequency at all indexes which are responsible for frequency whose value we are changing. In reading cumulative frequency at some index, we were removing the last bit and going on. In changing some frequency val in tree, we should increment value at the current index (the starting index is always the one whose frequency is changed) for val, add the last digit to index and go on while the index is less than or equal to MaxVal. Function in C++:

void update(int idx ,int val){

Image 1.5 – arrows show path from index to zero which we use to get sum (image shows example for index 13)

while (idx <= MaxVal){ tree[idx] += val; idx += (idx & ‐idx); } }

Let’s show example for idx = 5:

iteration idx position of the last digitidx & -idx 1 5 = 101 0 1 (2 ^0) 2 6 = 110 1 2 (2 ^1) 3 8 = 1000 3 8 (2 ^3) 4 16 = 10000 4 16 (2 ^4) 5 32 = 100000 — —

return sum; }

Here’s an example for getting the actual frequency for index 12:

First, we will calculate z = 12 – (12 & -12) = 8, sum = 11

iteration y position of the last digit y & -y sum 1 11 = 1011 0 1 (2 ^0) 9 2 10 = 1010 1 2 (2 ^1) 2 3 8 = 1000 — — —

Image 1.7 – read actual frequency at some index in BIT (image shows example for index 12)

Let’s compare algorithm for reading actual frequency at some index when we twice use function read and the algorithm written above. Note that for each odd number, the algorithm will work in const time O(1), without any iteration. For almost every even number idx, it will work in c * O(log idx), where c is strictly less than 1, compare to read(idx) – read(idx – 1), which will work in c1 * O(log idx), where c1 is always greater than 1.

Time complexity: c * O(log MaxVal), where c is less than 1. Code length: Up to fifteen lines.

Scaling the entire tree by a constant factor Sometimes we want to scale our tree by some factor. With the procedures described above it is very simple. If we want to scale by some factor c, then each index idx should be updated by -(c – 1) * readSingle(idx) / c (because f[idx] – (c – 1) * f[idx] / c = f[idx] / c). Simple function in C++:

void scale(int c){ for (int i = 1 ; i <= MaxVal ; i++) update(‐(c ‐ 1) * readSingle(i) / c , i); }

This can also be done more quickly. Factor is linear operation. Each tree frequency is a linear composition of some frequencies. If we scale each frequency for some factor, we also scaled tree frequency for the same factor. Instead of rewriting the procedure above, which has time complexity O(MaxVal * log MaxVal), we can achieve time complexity of O(MaxVal):

void scale(int c){ for (int i = 1 ; i <= MaxVal ; i++) tree[i] = tree[i] / c; }

Time complexity: O(MaxVal). Code length: Just a few lines.

Find index with given cumulative frequency The naive and most simple solution for finding an index with a given cumultive frequency is just simply iterating through all indexes, calculating cumulative frequency, and checking if it’s equal to the given value. In case of negative frequencies it is the only solution. However, if we have only non-negative frequencies in our tree (that means cumulative frequencies for greater indexes are not smaller) we can figure out logarithmic algorithm, which is modification of binary search. We go through all bits (starting with the highest one), make the index, compare the cumulative frequency of the current index and given value and, according to the outcome, take the lower or higher half of the interval (just like in binary search). Function in C++:

bitMask >>= 1; } if (cumFre != 0) return ‐1; else return idx; }

Example for cumulative frequency 21 and function find:

First iteration

tIdx is 16; tree[16] is greater than 21; half bitMask and continue

Second iteration

tIdx is 8; tree[8] is less than 21, so we should include first 8 indexes in result, remember idx because we surely know it is part of result; subtract tree[8] of cumFre (we do not want to look for the same cumulative frequency again – we are looking for another cumulative frequency in the rest/another part of tree); half bitMask and contiue Third iteration

tIdx is 12; tree[12] is greater than 9 (there is no way to overlap interval 1- 8, in this example, with some further intervals, because only interval 1- 16 can overlap); half bitMask and continue Forth iteration

tIdx is 10; tree[10] is less than 9, so we should update values; half bitMask and continue Fifth iteration

tIdx is 11; tree[11] is equal to 2; return index (tIdx)

Time complexity: O(log MaxVal). Code length: Up to twenty lines.

2D BIT BIT can be used as a multi-dimensional data structure. Suppose you have a plane with dots (with non-negative coordinates). You make three queries:

  1. set dot at (x , y)
  2. remove dot from (x , y)
  3. count number of dots in rectangle (0 , 0), (x , y) – where (0 , 0) if down-left corner, (x , y) is up-right corner and sides are parallel to x-axis and y-axis.

If m is the number of queries, max_x is maximum x coordinate, and max_y is maximum y coordinate, then the problem should be solved in O(m * log (max_x) * log (max_y)). In this case, each element of the tree will contain array

  • (tree[max_x][max_y]). Updating indexes of x-coordinate is the same as before. For example, suppose we are

setting/removing dot (a , b). We will call update(a , b , 1)/update(a , b , -1), where update is:

void update(int x , int y , int val){ while (x <= max_x){ updatey(x , y , val); // this function should update array tree[x] x += (x & ‐x); } }

The function updatey is the “same” as function update:

void updatey(int x , int y , int val){ while (y <= max_y){ tree[x][y] += val; y += (y & ‐y); } }

It can be written in one function/procedure:

void update(int x , int y , int val){ int y1; while (x <= max_x){ y1 = y; while (y1 <= max_y){ tree[x][y1] += val; y1 += (y1 & ‐y1); } x += (x & ‐x); } }

  1. Q i (answer 0 if i-th card is face down else answer 1)

Solution:

This has solution for each query (and 1 and 2) has time complexity O(log n). In array f (of length n + 1) we will store each query T (i , j) – we set f[i]++ and f[j + 1]–. For each card k between i and j (include i and j) sum f[1]

  • f[2] + … + f[k] will be increased for 1, for all others will be same as before (look at the image 2.0 for clarification), so our solution will be described sum (which is same as cumulative frequency) module 2.

Use BIT to store (increase/decrease) frequency and read cumulative frequency.

Conclusion

Binary Indexed Trees are very easy to code.

Each query on Binary Indexed Tree takes constant or logarithmic time.

Binary Indexeds Tree require linear memory space.

You can use it as an n-dimensional data structure.

References [1] RMQ [2] Binary Search [3] Peter M. Fenwick

Image 2.

Member Tutorials Read more than 40 data science tutorials written by topcoder members. Problem Set Analysis Read editorials explaining the problem and solution for each Single Round Match (SRM). Data Science Guide New to topcoder's data science track? Read this guide for an overview on how to get started in the arena and

More Resources

© 2015 Topcoder. All Rights Reserved

SITEMAP

ABOUT US

CONTACT US

HELP CENTER

PRIVACY POLICY

TERMS

OTHERS

Topcoder is also on

how competitions work.

Help Center Need specifics about the process or the rules? Everything you need to know about competing at topcoder can be found in the Help Center.

Member Forums Join your peers in our member forums and ask questions from the real experts - topcoder members!