Author: Jonas Bilinkevicius
I cannot use the Sort method in TStringList as I would like to sort by Integer. My
TStringList is filled with numbers such as:
20, 12, 1, 23, 54, 32
Of course, they're converted to string before being added to TStringList. What is a
fast algorithm to achieve this sort? I normally have less than 50 items in my
TStringList, if that is a factor.
Answer:
You'd end up doing a lot of conversions using StrToInt, which is wasteful, so I
would recommend that you create a
type
PInteger = ^Integer type
store all of the StrToInt values in the TStringList.Objects array, and then when
you use the sort, do your comparisons based on
PInteger(SL.Objects[Idx])^
The quicksort that TStringList uses (see CLASSES.PAS) uses a very simple partition
function, which is completely unaware of the data it's sorting. It's using the
midpoint index to begin to decide where to start partitioning, which is just as
reliable as picking a random number when deciding how to sort. If, for example, you
had a big list of items that was already sorted in the reverse direction, and you
used this quicksort on it, and would call itself recursively once for every element
in the list! Now, when you take into account that you're pushing a few items on the
stack (the return address as well as the parameters as well as the registers you
are saving) it might not take too long for your 16K of stack space to get eaten up
(16,384 bytes divided by about maybe 32 bytes (and that's being pretty optimistic!)
is about 2048 items before you run the risk of killing the stack!). The MaxListSize
in CLASSES is 16380 (65520 div sizeof (Pointer)), so it's certainly possible to
cause this problem.
Remember that TStringList.Sort is declared as virtual, so if you wanted to override
it, you certainly could in a class derived from TStringList.
Also mind that the odds of anyone having to sort this much data (2000 items) seems
pretty remote (correct me, anyone, if you've ever sorted more than 2000 strings in
an application). The most reliable sort with the same running time as QuickSort is
a HeapSort. They both run in O(N lg N) time, whereas sorts like the InsertionSort
(which someone mentioned) and BubbleSort (which someone else mentioned) run in
O(N^2) time, on the average.
The biggest differences between HeapSort and QuickSort, in terms of their run time
and storage are:
HeapSort only calls itself recursively at most lg N times, where as QuickSort could
call itself recursively N times (big difference, like 10 vs 1024, or 32 vs 2^32);
The worst case upper bound time on HeapSort is only O(N lg N), whereas in the worst
case for QuickSort, the running time is O(N^2).
1 program H;
2
3 uses
4 WinCrt, SysUtils;
5
6 const
7 min = 10;
8 max = 13;
9 maxHeap = 1 shl max;
10
11 type
12 heap = array[1..maxHeap] of integer;
13 heapBase = ^heap;
14
15 var
16 currentSize, heapSize: integer;
17 A: heapBase;
18
19 procedure SwapInts(var a, b: integer);
20 var
21 t: integer;
22 begin
23 t := a;
24 a := b;
25 b := t
26 end;
27
28 procedure InitHeap(size: integer);
29 var
30 i: integer;
31 begin
32 heapSize := size;
33 currentSize := size;
34 Randomize;
35 for i := 1 to size do
36 A^[i] := Random(size) + 1;
37 end;
38
39 procedure Heapify(i: integer);
40 var
41 left, right, largest: integer;
42 begin
43 largest := i;
44 left := 2 * i;
45 right := left + 1;
46 if left <= heapSize then
47 if A^[left] > A^[i] then
48 largest := left;
49 if right <= heapSize then
50 if A^[right] > A^[largest] then
51 largest := right;
52 if largest <> i then
53 begin
54 SwapInts(A^[largest], A^[i]);
55 Heapify(largest)
56 end;
57 end;
58
59 procedure BuildHeap;
60 var
61 i: integer;
62 begin
63 for i := heapSize div 2 downto 1 do
64 Heapify(i)
65 end;
66
67 procedure HeapSort;
68 var
69 i: integer;
70 begin
71 BuildHeap;
72 for i := currentSize downto 2 do
73 begin
74 SwapInts(A^[i], A^[1]);
75 dec(heapSize);
76 Heapify(1)
77 end;
78 end;
79
80 type
81 TAvgTimes = array[min..max] of TDateTime;
82 var
83 sTime, eTime, tTime: TDateTime;
84 i, idx, size: integer;
85 avgTimes: TAvgTimes;
86 begin
87 tTime := 0;
88 i := min;
89 size := 1 shl min;
90 new(A);
91 while i <= max do
92 begin
93 for idx := 1 to 10 do
94 begin
95 InitHeap(size);
96 sTime := Time;
97 HeapSort;
98 eTime := Time;
99 tTime := tTime + (eTime  sTime)
100 end;
101 avgTimes[i] := tTime / 10.0;
102 inc(i);
103 size := size shl 1;
104 end;
105 end.
