> import Data.List
Let's start with the definition of a simple data structure. Actually, it's a data
structure we already know: a list. However, we're going to consider it abstractly
now and permit just three operations on it:
- create an empty list
- insert an element into the list
- extract the element with the lowest value from the list
Using standard list operations, this is easily done:
> empty = []
> enqueue x u = x:u
> dequeue [] = Nothing
> dequeue u = dequeue1 (foldl min (head u) u) u
> dequeue1 x u = Just (x,delete x u)
Let's declare the types for this as well and limit ourselves to lists of integers
(just to keep the types simple):
> type PQ = [Int]
> empty :: PQ
> enqueue :: Int -> PQ -> PQ
> dequeue :: PQ -> Maybe (Int,PQ)
Now let's "bundle up" these operations in an interface. A datatype with this
interface is called a priority queue. So, we define a "priority queue module"
that combines all three operations into a single "thing".
This is similar to what the built-in "module" construct does, but you don't
actually need such a language construct. In fact, doing what we are doing here
is more flexible (but also more work).
So, a priority queue module consists of three things: an instance of an empty
queue, a function to add an integer to the queue, and a function that removes
the smallest element from the queue (or returns Nothing if the queue is empty).
> data PQModule a = PQMODULE a (Int -> a -> a) (a -> Maybe (Int,a))
Now we make an instance of this module by actually supplying the three operations
to the constructor:
> list_pq :: PQModule PQ
> list_pq = PQMODULE empty enqueue dequeue
In terms of this, we can now write a sort function. The idea is to first
put all the elements into the priority queue. Once the priority queue has
been built, we repeatedly extract the minimum element and put all the
elements that we get into a list. This list is then sorted.
For building the priority queue, we use the foldr function, and for
extracting the elements and building a list, we use the "opposite" of
the foldr function, the unfoldr function:
> pq_sort :: PQModule a -> [Int] -> [Int]
> pq_sort (PQMODULE empty enqueue dequeue) l =
> let pq = foldr enqueue empty l in
> unfoldr dequeue pq
Selection sort is now simply an instantiation of pq_sort with the
list priority queue:
> selection_sort = pq_sort list_pq
Of course, all of this would be an idle exercise if we didn't have a better
priority queue implementation available.
Here is an implementation of a priority queue as a heap. A heap is a
very common data structure, and we have talked about it already.
This implementation looks simple on the surface and implements the
heap property in a straightforward way, but it takes a bit of thought
to figure out why it is fast and how it is faster than the list
implementation.
A heap is just a marked binary tree, possibly empty, with the special
property that the mark at any node is bigger (or smaller, depending on
how we define it--here we need smaller) than the marks of all the
children.
> data Ord k => Heap k = E | N k (Heap k) (Heap k) deriving (Show,Eq)
To keep things simple, we just consider integer heaps.
> type IntHeap = Heap Int
Usually, we talk about inserting elements into trees, but it's sometimes easier
to consider combining two trees as the basic operation. These two views
are easily related by constructing a singleton tree and then combining it with
an existing tree, instead of inserting an element into a tree.
> singleton k = N k E E
Let's define a function to extract the key (this will give an error on the
empty tree).
> key (N k _ _) = k
This function constructs a heap in which the first node always remains above
the second node. If we wanted to be extra careful, we should check that
k < key x, k < key y, and k < key z.
> above (N k E x) y = (N k y x)
> above (N k x y) z = (N k y (combine x z))
This combines two trees. The first two cases are trivial. The last
case just checks whether the first argument or the second argument has
the lower key and then places the correct node above the other, preserving
the heap property. (Remember, here we need a min-heap, not a max-heap.)
> combine u E = u
> combine E v = v
> combine u v =
> if key u <= key v then u `above` v
> else v `above` u
That's all we need in terms of helper functions. We are now ready to
define the heap operations.
The empty heap is easy:
> heap_empty = E
To add an element to the heap, we combine the singleton heap with the
given heap.
> heap_enqueue k u = combine (singleton k) u
To remove an element from the heap, we return Nothing if the heap is
empty, otherwise, we return the mark at the root and combine the two
children.
> heap_dequeue E = Nothing
> heap_dequeue (N k u v) = Just (k,combine u v)
For completeness, let's add the type declarations.
> heap_empty :: IntHeap
> heap_enqueue :: Int -> IntHeap -> IntHeap
> heap_dequeue :: IntHeap -> Maybe (Int,IntHeap)
Now let us stick these three functions into a PQModule:
> heap_pq :: PQModule IntHeap
> heap_pq = PQMODULE heap_empty heap_enqueue heap_dequeue
And now, in order to get the heapsort algorithm, we just apply our
generic queue sort implementation to the heap priority queue.
> heap_sort = pq_sort heap_pq
Here is a simple example of all of that working.
> main = do
> print (selection_sort [9,1,5,4,2,3,7,8,6])
> print (heap_sort [9,1,5,4,2,3,7,8,6])
>
Now, remember: this implementation of a heap looks simple, but it is
subtle and it is difficult to estimate how much time it takes. The issue
is that we aren't maintaining the "almost complete" property, we just insert
nodes seemingly haphazardly.
That means that from this implementation, it is not obvious that
heap_sort is actually faster than selection_sort. We will later see
an implementation in an imperative programming language that is a
little more complicated, but whose behavior is much easier to
understand.