Wednesday, November 8, 2017

find median from unsorted array of intergers

For example [3,4,5,2,1], median is 3.
[3,4,5,2,1,6] median is (3+4)/2=3.5

The easiest way is to sort them and then calculate the median. This is easy to read but containing some unnecessary loop as you don't need sort all values.

By array length, we can know the index of the median.
Let's say the array is 3,9,8,1,0,4,5,2,7,6
Index starts from 0, leftM is index 4,  rightM is index 5. (They will be the same if the array length is odd.)
Using the quicksort algorithm, after partition, there are two partitions. One is less than the pivot and the other one is great than the pivot.
Use the first number as pivot. After partition, the array is
1,0,2 |3| 9,8,4,5,7,6
the pivot index is 3.
Check the partition length to determine which partition contains leftM and rightM.
The right partition 9,8,4,5,7,6, contains the leftM and rightM
Calculate the new leftM and rightM on the partition.
As the pivot index is 3 in original array, in the new partition, we are looking for leftM = 4 -3 -1 = 0, rightM = 5-1-3 =1
Now we need the frist and second number after sort the right partition.
Do the same steps on the new partition, pivot is 9
8,4,5,7,6 | 9
use left partition and left partition size > 2, we know the number we are looking for are still in the left partition.
pivot is 8
4, 5, 7, 6 | 8
Still in left partition
pivot is 4
4 | 5, 7, 6, 8
ok, the left partition is 1, that is one median number we are looking for.
the other number now is the first number after sorting the right partition
repeat the same way, we can find it is 5.

It is using the quicksort algorithm to find the partition where the median exists and sort on that partition. The worst time complexity is O(n(log(n))). If sorting the whole array, the best time complexity is O(n(log(n))) and the worst is O(n^2).

========================================

Another implementation here:
We separate the numbers into two partitions, left and right.
The left number is always smaller than right.
The size diff between left and right shall <=1.
If the right size - left size > 1, move the smallest right to the left, which will become the biggest of the left.
If the left size - right size > 1, move the biggest left to the right, which will become the smallest of the left.
Eventually, when all numbers are added to the partitions. If two partitions size are the same, median = (biggest left + smallest right) /2
If right size > left size, median = smallest right
if left size > right, median = biggest left
Using the example above. We push the number one by one into the two partitions.
3,9,8,1,0,4,5,2,7,6
left [    ], right [    ]
the first number is 3, as both are empty, let's put it into right.
[   ] [3  ]
then 9 compare with the biggest left (null) and smallest right (3), it shall be in the right,
[   ] [3 9]
now the size of right  - the size of left > 1. we need move one to the smallest one to the left
[   3] [9   ]
Then 8, because it is between smallest right and biggest right, we put it in right
[   3] [8, 9   ]
Then 1
[1, 3] [8, 9   ]
Then 0
[1,0,3] [8, 9   ]
Then 4, between smallest right and biggest left, put it in right
[1,0,3] [4,8,9]
Then 5
[1,0,3] [4,5,8,9]
Then 2
[1,0,2,3] [4,5,8,9]
Then 7
[1,0,2,3] [4,7,5,8,9]
Then 6
[1,0,2,3] [4,6,7,5,8,9]
now the right size - left size > 1, move the smallest right to the left
We also need to find the min number in the rest of the right
[1,0,2,3,4] [5,6,7,8,9]
Now all numbers are in two partitions. because the sizes are equal, median = (4+5)/2
If left size < right, the biggest left is median.
If right size > left, the smallest right is median.

For time complexity, the worst case is every time when you add the number to one partition, it needs to pop one item and add it to the other partition, and the current partition need find the min/max number.
Iteration the array: n
+
find min/max, to find the max/min in 2 items, iterate 2 times
2 + 3 + 4 + ... + n/2 (as the left right size are almost same, big diff is 1, the max size of left/right is close to n/2)
So the time complexity is O(n + 2 + 3 +...+n/2) = O(n)

No comments:

Post a Comment