A co worker emailed out an interesting question, how would we optimise the following piece of code –

1 2 3 4 5 6 7 8 9 10 11 |
// Returns the average of k smallest values from the array arr of size n. // The original order of elements in arr is not guaranteed to be preserved. float average_of_k_smallest(float* arr, int n, int k) { assert(k > 0 && k <= n); std::sort(arr, arr + n); float s = 0.0f; for (int i = 0; i < k; i++) s += arr[i]; return s/k; } |

The code looks fast, light and up to the job right? Sure, it will work fine and it will be reasonably fast, but it can always be faster. The first obvious waste of time is with the sort; sorts are lengthy beasts and as much as possible should be done to minimise their impact. The aim of this function is to get the average of the k smallest values in the array, imagine if the array is ten thousand elements big and you only need the average of the two smallest? Sorting that entire array just to get two values is a waste of time.

So, how do we optimise this? My lecturer at university always used to tell us how good the STL is and that it was nearly always better and faster than anything we can write. The obvious solution is to move away from the primitive programming shown in the above function and see what the STL can do for us; luckily enough there are several great functions that do exactly what we want. The first great time saver is the partial sort which simply sorts the array in such a way that the range we specify contains the sorted elements but anything above that middle point is left unsorted. Simply using this method would save a large amount of time over the original method of doing a full sort.

The only requirement of these improvements is to change the function parameters to accept a vector rather than a float array, luckily there were no rules set in the original question!

In order to show that these changes would be faster I wrote some code to prove it. It runs the old function and the new function ten thousand times with the same data and calculates the time taken based on clock cycles. The quicker the better. Take note that the time also includes creating a copy of the input array. If we just pass in the array it will be sorted in place so in reality we are testing how quick it is on already sorted data.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
#include <iostream> #include <ctime> #include <vector> #include <algorithm> #include <iterator> #include <numeric> #include <cassert> using namespace std; const int ARRAY_SIZE = 10000; const int FIND_SMALLEST = 300; const int RUN_N_TIMES = 10000; float average_of_k_smallest(float* arr, int n, int k); float average_of_k_smallest_partial_sort(vector<float> arr, int n); int main(int argc, char* argv[]) { clock_t begin, end; float res; srand(time(NULL)); float array[ARRAY_SIZE]; vector<float> array_v(ARRAY_SIZE); // load the array with random data for(int i = 0; i < ARRAY_SIZE; ++i) { array[i] = (float)rand()/(float)RAND_MAX; // random numbers from 0..1 array_v[i] = array[i]; } begin = clock(); for(int i = 0; i < RUN_N_TIMES; ++i) { // Make a copy of the array due to array being passed by reference. float array_copy[ARRAY_SIZE]; copy(&array[0], &array[ARRAY_SIZE], &array_copy[0]); res = average_of_k_smallest(array_copy, ARRAY_SIZE, FIND_SMALLEST); } end = clock(); cout << "Original function: (" << res << ") " << (float)(end - begin)/CLOCKS_PER_SEC << " sec" << endl; begin = clock(); for(int i = 0; i < RUN_N_TIMES; ++i) { // Make a copy of the array vector<float> array_copy(array_v); res =average_of_k_smallest_partial_sort(array_copy, FIND_SMALLEST); } end = clock(); cout << "STL function: (" << res << ") " << (float)(end - begin)/CLOCKS_PER_SEC << " sec" << endl; return 0; } // Returns the average of k smallest values from the array arr of size n. // The original order of elements in arr is not guaranteed to be preserved. float average_of_k_smallest(float *arr, int n, int k) { assert(k > 0 && k <= n); std::sort(arr, arr + n); float s = 0.0f; for (int i = 0; i < k; i++) s += arr[i]; return s/k; } float average_of_k_smallest_partial_sort(vector<float> arr_v, int k) { partial_sort(arr_v.begin(), arr_v.begin()+k, arr_v.end()); float sum = accumulate(arr_v.begin(), arr_v.begin()+k, 0.0); return sum/k; } |

And the results speak for themselves, across ten thousand iterations of each function the STL function is generally twice as fast.

Original function: (0.0157705) 20.56 sec

STL function: (0.0157705) 7.75 sec

## Questions? Comments?