Big-O Notation – Programming by Design

(Updated August 26, 2020)

Overview

As we write programs we gain knowledge. This is especially true in an educational environment. We learn when to select certain constructs in favor of others. We may also learn about scalability, execution time, memory consumption and I/O utilization. If we are truly blessed, our instructor even explains in detail why it is important to do something a certain way or why it is important to learn some seemingly obscure programming concepts like bitmaps and hashing.

Nothing speaks louder than numbers. Whether we are talking about the price of a car, the interest rate of a loan or how much that mortgage really costs after thirty years the numbers speak volumes. It is also true with our programs we can often determine how well a program is performing and even find ways to improve it based on behaviors or example data. Of course, there are times we cannot find the worst case and other times we are not given true data to test or perhaps the volume of data is poorly represented.

It is a great feeling when we can cut the memory consumption of a program to 1/16th of what it was. It is equally great to realize that we can cut the run time in half. Ultimately we need to look at our program and ask the question, “but what does it do?” Can we define a unit of work? Perhaps define the main operation or group of operations that make up the bulk of our program or even just an algorithm with a program. We must also keep in mind that sometimes it is not the obvious operations that we are looking for. Sometimes they are hidden and require digging deeper and sometimes we simply cannot define the operations in simple terms by which measurement is meaningful.

Once we establish the operation(s) that we wish to investigate, we need to find out how many operations will be performed on a given set of data. The data can be small for investigative purposes, but ultimately we need to ramp up the amount of data and really discover how many operations are necessary to process a data set of that size. Given an amount of data represented by n we can then determine that the number of operations required to process that data is on the order of some value.

Big-O Notation

Big-O notation is a shorthand method of expressing asymptotic analysis. To avoid the mathematical jargon of describing limiting behavior, we propose a method of observation that is easy to understand and represent. The idea is determining the behavior of programs as the date set grows. Some may choose to view this as worst case. That is fine as long as you understand the notaion was never meant to express the knowledge in those terms.

The idea of Big-O is to represent Time Complexity as well as Space Complexity. The work software performs over time may also require more memory to solve the problem. Therefore both must be considered.

n	Constant O(1)	Logarithmic O(log₂n)	Linear O(n)	Linear O(nlog₂n)	Quadratic O(n²)	Cubic O(n³)
1	1	1	1	1	1	1
4	1	2	4	8	16	64
16	1	4	16	64	256	4096
512	1	9	512	4608	262144	134,217,728
1024	1	10	1024	10240	1,048,576	~10⁹
4096	1	12	4096	49152	16,777,216	68 x 10⁹
262144	1	18	262144	4,718,592	68 x 10⁹	1.8 x 10¹⁶
1,048,576	1	20	1,048,576	20,971,520	10¹²	10¹⁸

Table 1: Number of operations performed by a particular form of algorithm for values of n.

Processing data, the amount of storage it requires and the number of operations required can be noted in a variety of ways. Table 1 shows some basic values for n and the Big-O values.

Note the use of O() notation with a value in parentheses. This is the notation for saying on the order of. So if a particular algorithm is O(n), we can say that the number of operations is on the order of n. Or, that for n data items we need to perform n operations.

Well that is fine for time complexity, but what of space complexity? We can use the same notation. If we need to store an array of integers, we can say the the space complexity is O(n). This is because we need to store n integers in the array. It does not matter that each int is 4 bytes – it is an int. If you want to know the physical space it will occupy, then you can do the math and determine that value. But that is beside the point.

If we wanted to sort the data values in the array, the time complexity can be vastly different from the space complexity. For example a Bubble Sort uses the same space to perform the sort. So, the space complexity is O(n). There is a little more space for the swapping of values, but that is considered negligible. However, Bubble Sort is a poor implementation of a sort and is O(n²) because it effectively scans n values n times.

A note on values.

It is not necessary to be concerned with constant multipliers or differences. After all who would be impressed with an algorithm that measures O(n²-3n) when n is 1,024? That is 1,048,576 less 3,072. The 3n is hardly worth mentioning as it does not detract from the staggering fact that the work necessary is three orders of magnitude larger than the number of data items!

What is an operation?

This article is approaching the Time Complexity as a measure of operations which can be used to measure overall runtime when we know the cost of an operation. Now that we have discussed notation, data sets and operations, we should probably define an operation with some examples. Table 2 shows some typical forms of work considered an operation with projected Big-O values. With Table 2 as a guide, we will now talk about some of the examples given.

Work related to an operation	Time Complexity	Space Complexity	Comment
Linear search (array/linked-list)	O(n)	O(n)
Binary search an array	O(log₂n)	O(n)	Data must be in sorted order.
Search a balanced tree	O(1)	O(n)
Search, add or remove operation on a hash table	O(1)	O(1)
Push/pop stack operation	O(1)	O(1)
Insert into front or middle of array	O(n)	O(1)	Array data must be adjusted.
Insert at end of array or front of linked list	O(1)	O(1)	Assuming array is not full.
Insert into an ordered linked list	O(n)	O(1)	Must find insert location which is linear.
Bubble sort	O(n²)	O(n)
Quicksort/Heapsort	O(n log₂n)	O(n log₂n)	Quicksort has worst case O(n²)

Table 2: Example work with the time and space complexity.

Be mindful of the constant or O(1) case. If it is determined that for any value of n the program takes 32 operations to complete the work, we say that the algorithm is an O(1) algorithm because as n approaches infinity, we will still only perform 32 operations. Even if this number increased slightly, it would not mean it was no longer constant.

The linear search example should be pretty easy to see. Since the data is assumed to be an unordered array or linked list, searching the contents requires a comparison of each value in the array to find the key value being sought. The detail here is that the operation is the comparison. The best case is that we will find the key in the first position, but that is typically not the case. The average number of values to be compared will be n/2, but as we stated before when the values are very large (millions) constant multipliers do not add anything meaningful, therefore the average case of linear search is the same as the worst case: O(n).

Additional operations to arrays and linked lists include inserting into the middle or ordered data. Since this ultimately requires a search to find the insertion point, we already know this search is linear. Therefore, this algorithm is O(n).

Binary searching an array of sorted values poses an interesting discussion. Since the data is in order, we can take the midpoint of the data make a comparison and move left or right. The comparison is still the operation. Since we keep dividing the field in half we will swiftly exhaust the data in just a few operations. For example with 1000 items the first comparison reduces the field to 500, the second to 250 and the third comparison to 125. This means we are dealing with powers of two (hence the name binary search). This means that since we started with 1000, we will not need to compare more than 10 items. Why? 2¹⁰ is 1024. The exponent of 2 is what we are interested in. The exponent represents the number of comparisons since the field keeps getting cut in half. Therefore the binary search is O(log₂n) since log₂n gets us the exponent – the power of two – that produces n. This is often simplified to O(log n) and the 2 is implied since we are talking about binary searches.

Hash tables are designed with performance in mind. Inserting into a hash table is O(1). This is because the bucket is immediately determined and new items are likely put on the front of the chain (linked-list). With regards to searching and deleting, even if there were 32 items in each bucket it is still O(1). What is interesting to ponder further is the fact that the act of hashing a string is O(n). This is because we will access each character of the string in the mathematical calculation. Is this significant? Probably not. Hashing is cheap and typically deals with less than a dozen characters, but it is a good opportunity to look deeper into an algorithm and see past the O(1) and find an O(n) portion of the overall solution. Also consider that hashing is part of the operation and therefore also constant.

Conclusion

When we consider using a built-in library or writing our own solution to a given problem, we must think about what the operation will be. We might measure how long an operation takes. We should also give some thought to what will happen to that library call or homemade solution when presented with very large data sets. And do not forget the detail of how much memory is needed to solve the problem. If our code holds up and does not show a dramatic slowdown, then we are well on our way to producing a well designed, scalable product. Otherwise we may need to consider some alterations or, at the very least, a disclaimer.