Its All In the Timing

The following code implements an insertion sort. It is a simplified version of the example in the CTE documentation.

  int  i, j;
  int  x;  // the next item to be placed
  
  // outer loop goes through array one element at a time, it is  O(n)
  for(int  i = 1; i < n; ++i)  {
    /*
     The inner loop goes from i down, shifting elements up until
     correct position for x is found. That means as i varies from 1 to n-1, in
     the worst case, where the loop must iterate all the way to zero,
     the loop will have to on the average do (n-1)/2 iterations).  That means 
     the inner loop will go down to zero almost n/2 times.
    */
    for(x = A[i], j = i; j > 0 && x < A[j - 1]; --j)  {
      A[j] = A[j-1];// shift up by one
    }
    A[j] = x; // insert x in correct slot.
  }
}

Ignoring constant time for assignment and comparison, the outer loop is n times the inner loop is about n/2 times, therefore the algorithm is O(n²/2) or ignoring constants O(n²)

Ok, now what about that concept of logs. First, what is a log? Well, in a base x sytstem it is simply the exponent of x requried to get some n.

           base 10          base 2            base 5
        log(n)     n     log(n)      n     log(n)      n
           0       1        0        1        0        1
           1      10        1        2        1        5
           2     100        2        4        2       25
           3    1000        3        8        3      625   
           4   10000        4       16        4     3125
           5  100000        5       32        5     6250

By examination of the above table what conclusion can we draw? We can conclude that log(n) grows more slowly than n.

Further, since log(n) grows more slowly than n, n*log(n) will grow much more slowly than n*n.

Notice that log₂(n) is binary which is the basis for computer systems which are based on the 1,0 state variation.

So, what can we conclude about the types of loops that exhibit n*log(n) behavior? We can conclude that if the outer loop executes n times and that the innerloop starts with n then divides its control variable each time it is encountered, we have n*log(n) behavior. For example:

  n=42000;
  m = n;
  div = 6;// div can be any integer > 1
  for(i= 1;i<=n;i++){
    for(k=1;k<=m;k++){
    }
    m=m/div; //this gives the algorithm log(n) behavior
  }

The above program snippet will exhibit n*log(n) runtime behavior. What would happen if div = 1 ? Then we would have n*n runtime behavior.

This discussion begs the question, what if the innerloop control variable is simply decremented each time it is encountered?

  n=42000;
  m = n;
  diff = 6;// diff can be any integer > 0
  for(i= 1;i<=n;i++){
    for(k=1;k<=m;k++){
    }
    if(m>diff)
    m=m - diff; //this is NOT log(n) behavior
    }

We can see that this is better than n² behavior, but not much better. It is still of the order of O(n²). Why do we say it is still O(n²)? Lets plug in some values:

        outer loop         inner loop            product
            10               10/const              100/const
           100              100/const           10,000/const
         1,000            1,000/const        1,000,000/const
        10,000           10,000/const      100,000,000/const
       100,000          100,000/const   10,000,000,000/const

Clearly, as the choice of n has a much bigger impact than the choice of const, this type of behavior is of O(n²). Note: even if diff was large, at some point, the choice of n would dominate the runtime.

Remember that the order of is a "rule of thumb" estimate. It is not exact. It is meant to characterize the behavior of the algorithm for large inputs.

Lets see if our "rule of thumb" is valid by using the following short program. (you can cut and paste this program to use it yourself)

//----------------------------------------------------------------------
//
//  Time-stamp: <26/02/01 14:47 Kenneth L Moore>
//
// :File:      times.cpp
// :Purpose:   count the number of times an outer/inner loop combination
//             iterates for the case where the inner loop is divided by
//             a constant and the case where the inner loop is decremented
//             by a constant.
// :Usage:     times outer_loop_count  inner_loop_divisor inner_loop_decrement
//----------------------------------------------------------------------

//----------------------------------------------------------------------
// includes
#include 
#include 

int main(int  argc, char *argv[])
{
      // see if user had command line arguements correct
      if (argc != 4){
         cout<< "Usage " << argv[0] << " outer_loop_count  ";
         cout<< "inner_loop_divisor inner_loop_decrement  ";
         exit(0);
         }
      int outer = atoi(argv[1]), inner = outer;
      int div = atoi(argv[2]), diff = atoi(argv[3]), cdiv = 0, cdiff = 0;

      // do the inner loop divided case
      for(int i=1;i<=outer;i++){
        for(int k=1;k<=inner;k++){
          cdiv++;                         // count iterations for output
        }
        inner=inner/div;                  // implement inner div
      }

      // reset inner count and do the inner loop decremented case
      inner=outer;
      for(int i=1;i<=outer;i++){
        for(int k=1;k<=inner;k++){
          cdiff++;                         // count iterations for output
        }
        if(inner>diff)
          inner=inner-diff;                // implement inner diff
      }

      // results
      cout << " outer = " << outer << endl;
      cout << " inner loop divisor = " << div << " iteration count = " << cdiv << endl;
      cout << " inner loop subtractor = " << diff << " iteration count = " << cdiff << endl;

      return 0;
}

Here are some annotated outputs of the above program:


times 100 1 0
 outer = 100
 inner loop divisor = 1 iteration count = 10,000
 inner loop subtractor = 0 iteration count = 10,000

(When the divisor = 1 and the decrementor = 0, we have n*n behavior)

times 100 2 1
 outer = 100
 inner loop divisor = 2 iteration count = 197
 inner loop subtractor = 1 iteration count = 5,050

(A small divisor makes a huge difference)

times 100 2 10
 outer = 100
 inner loop divisor = 2 iteration count = 197
 inner loop subtractor = 10 iteration count = 1,450

(increasing the subtractor makes a less signifigant difference)

times 10000 2 1
 outer = 10,000
 inner loop divisor = 2 iteration count = 19,995
 inner loop subtractor = 1 iteration count = 50,005,000

(as the outer loop gets bigger the difference between n*n and n*log(n) 
becomes more readily apparent)

times 10000 2 1000
 outer = 10,000
 inner loop divisor = 2 iteration count = 19,995
 inner loop subtractor = 1,000 iteration count = 10,045,000

(now even a large subtractor makes little relative difference)

times 100000 2 1000
 outer = 100,000
 inner loop divisor = 2 iteration count = 199,994
 inner loop subtractor = 1,000 iteration count = 104,950,000

(at this point our "rule of thumb" proves to be valid)

So in the above example, we can see that n*log(n) behavior is much more desirable than n*n behavior, even if the n*n behavior is scaled by some divisor. We are looking at orders of magnitude differnces. 200 thousand is much more desirable than 100 million.