The following code implements an insertion sort. It is a simplified version of the example in the CTE documentation.
int i, j; int x; // the next item to be placed // outer loop goes through array one element at a time, it is O(n) for(int i = 1; i < n; ++i) { /* The inner loop goes from i down, shifting elements up until correct position for x is found. That means as i varies from 1 to n-1, in the worst case, where the loop must iterate all the way to zero, the loop will have to on the average do (n-1)/2 iterations). That means the inner loop will go down to zero almost n/2 times. */ for(x = A[i], j = i; j > 0 && x < A[j - 1]; --j) { A[j] = A[j-1];// shift up by one } A[j] = x; // insert x in correct slot. } }
Ignoring constant time for assignment and comparison, the outer loop is n times the inner loop is about n/2 times, therefore the algorithm is O(n2/2) or ignoring constants O(n2)
Ok, now what about that concept of logs. First, what is a log? Well, in a base x sytstem it is simply the exponent of x requried to get some n.
base 10 base 2 base 5 log(n) n log(n) n log(n) n 0 1 0 1 0 1 1 10 1 2 1 5 2 100 2 4 2 25 3 1000 3 8 3 625 4 10000 4 16 4 3125 5 100000 5 32 5 6250
By examination of the above table what conclusion can we draw? We can conclude that log(n) grows more slowly than n.
Further, since log(n) grows more slowly than n, n*log(n) will grow much more slowly than n*n.
Notice that log2(n) is binary which is the basis for computer systems which are based on the 1,0 state variation.
So, what can we conclude about the types of loops that exhibit n*log(n) behavior? We can conclude that if the outer loop executes n times and that the innerloop starts with n then divides its control variable each time it is encountered, we have n*log(n) behavior. For example:
n=42000; m = n; div = 6;// div can be any integer > 1 for(i= 1;i<=n;i++){ for(k=1;k<=m;k++){ } m=m/div; //this gives the algorithm log(n) behavior }
The above program snippet will exhibit n*log(n) runtime behavior. What would happen if div = 1 ? Then we would have n*n runtime behavior.
This discussion begs the question, what if the innerloop control variable is simply decremented each time it is encountered?
n=42000; m = n; diff = 6;// diff can be any integer > 0 for(i= 1;i<=n;i++){ for(k=1;k<=m;k++){ } if(m>diff) m=m - diff; //this is NOT log(n) behavior }
We can see that this is better than n2 behavior, but not much better. It is still of the order of O(n2). Why do we say it is still O(n2)? Lets plug in some values:
outer loop inner loop product 10 10/const 100/const 100 100/const 10,000/const 1,000 1,000/const 1,000,000/const 10,000 10,000/const 100,000,000/const 100,000 100,000/const 10,000,000,000/const
Clearly, as the choice of n has a much bigger impact than the choice of const, this type of behavior is of O(n2). Note: even if diff was large, at some point, the choice of n would dominate the runtime.
Remember that the order of is a "rule of thumb" estimate. It is not exact. It is meant to characterize the behavior of the algorithm for large inputs.
Lets see if our "rule of thumb" is valid by using the following short program. (you can cut and paste this program to use it yourself)
//---------------------------------------------------------------------- // // Time-stamp: <26/02/01 14:47 Kenneth L Moore> // // :File: times.cpp // :Purpose: count the number of times an outer/inner loop combination // iterates for the case where the inner loop is divided by // a constant and the case where the inner loop is decremented // by a constant. // :Usage: times outer_loop_count inner_loop_divisor inner_loop_decrement //---------------------------------------------------------------------- //---------------------------------------------------------------------- // includes #include#include int main(int argc, char *argv[]) { // see if user had command line arguements correct if (argc != 4){ cout<< "Usage " << argv[0] << " outer_loop_count "; cout<< "inner_loop_divisor inner_loop_decrement "; exit(0); } int outer = atoi(argv[1]), inner = outer; int div = atoi(argv[2]), diff = atoi(argv[3]), cdiv = 0, cdiff = 0; // do the inner loop divided case for(int i=1;i<=outer;i++){ for(int k=1;k<=inner;k++){ cdiv++; // count iterations for output } inner=inner/div; // implement inner div } // reset inner count and do the inner loop decremented case inner=outer; for(int i=1;i<=outer;i++){ for(int k=1;k<=inner;k++){ cdiff++; // count iterations for output } if(inner>diff) inner=inner-diff; // implement inner diff } // results cout << " outer = " << outer << endl; cout << " inner loop divisor = " << div << " iteration count = " << cdiv << endl; cout << " inner loop subtractor = " << diff << " iteration count = " << cdiff << endl; return 0; }
Here are some annotated outputs of the above program:
times 100 1 0 outer = 100 inner loop divisor = 1 iteration count = 10,000 inner loop subtractor = 0 iteration count = 10,000 (When the divisor = 1 and the decrementor = 0, we have n*n behavior) times 100 2 1 outer = 100 inner loop divisor = 2 iteration count = 197 inner loop subtractor = 1 iteration count = 5,050 (A small divisor makes a huge difference) times 100 2 10 outer = 100 inner loop divisor = 2 iteration count = 197 inner loop subtractor = 10 iteration count = 1,450 (increasing the subtractor makes a less signifigant difference) times 10000 2 1 outer = 10,000 inner loop divisor = 2 iteration count = 19,995 inner loop subtractor = 1 iteration count = 50,005,000 (as the outer loop gets bigger the difference between n*n and n*log(n) becomes more readily apparent) times 10000 2 1000 outer = 10,000 inner loop divisor = 2 iteration count = 19,995 inner loop subtractor = 1,000 iteration count = 10,045,000 (now even a large subtractor makes little relative difference) times 100000 2 1000 outer = 100,000 inner loop divisor = 2 iteration count = 199,994 inner loop subtractor = 1,000 iteration count = 104,950,000 (at this point our "rule of thumb" proves to be valid)
So in the above example, we can see that n*log(n) behavior is much more desirable than n*n behavior, even if the n*n behavior is scaled by some divisor. We are looking at orders of magnitude differnces. 200 thousand is much more desirable than 100 million.