Articles   Members Online:
-Article/Tip Search
-News Group Search over 21 Million news group articles.
-Delphi/Pascal
-CBuilder/C++
-C#Builder/C#
-JBuilder/Java
-Kylix
Member Area
-Home
-Account Center
-Top 10 NEW!!
-Submit Article/Tip
-Forums Upgraded!!
-My Articles
-Edit Information
-Login/Logout
-Become a Member
-Why sign up!
-Newsletter
-Chat Online!
-Indexes NEW!!
Employment
-Build your resume
-Find a job
-Post a job
-Resume Search
Contacts
-Contacts
-Feedbacks
-Link to us
-Privacy/Disclaimer
Embarcadero
Visit Embarcadero
Embarcadero Community
JEDI
Links
How to Get the number of records from a fixed-length ASCII file Turn on/off line numbers in source code. Switch to Orginial background IDE or DSP color Comment or reply to this aritlce/tip for discussion. Bookmark this article to my favorite article(s). Print this article
27-Dec-02
Category
Files Operation
Language
Delphi 2.x
Views
110
User Rating
No Votes
# Votes
0
Replies
0
Publisher:
DSP, Administrator
Reference URL:
DKB
			Author: Jonas Bilinkevicius

I work a lot with fixed-length ASCII files, and I need to know how many total lines 
there are in a file. Sure, I can open up the file in a text editor, but really 
large files take forever to load. Is there a better way?

Answer:

As Mr. Miyagi said to Daniel-san in Karate Kid, "Funny you should ask..." Yes, 
there is a better way. What I'm going to show you may not be the best way, but it's 
reasonably fast, and exceptionally easy to use. It starts out with this premise. If 
you know the total number of bytes in the file and know the length of each record, 
then if you divide the total bytes by the record length, you should get the number 
of records in the file. Sounds reasonable, right? And it's exactly the way we do it.

For this example, I used a TFileStream object to open up my text file. I like using 
this particular object because it has come convenient methods and properties that I 
can use to get the information that I need; in particular, the Size property and 
the Read and Seek methods. How do I use them? Let's go through some plain English 
to give you an idea:

Open up a file stream on a text file
Get its total byte size
Now, serially move through the file, byte-by-byte reading each byte into a 
single-character buffer until you reach a return character (#13).
As you pass each byte, increment a counter variable that will serve as both a file 
reference point and later, the length of the record.
When you get to the return character, break out of the loop, add 2 to the reference 
counter (to account for the #13#10 CR/LF pair).
Finally return the result as the file size divided by the record length.

Here's the code that accomplishes the English above:
1   
2   {======================================================================
3    This function will give you the exact record count of a file. It uses
4    a TFileStream and goes through it byte by byte until it encounters
5    a #13. When it does, it adds 2 to the recLen to account for the #13#10
6    CR/LF pair, then divides the byte size of the file by the record true
7    record length.
8   
9    Note that this will only work on text files.
10   ======================================================================}
11  
12  function GetTextFileRecords(FileName: string): Integer;
13  var
14    ts: TFileStream;
15    fSize,
16      recLen: Integer;
17    buf: Char;
18  begin
19    buf := #0;
20    recLen := 0;
21    //Open up a File Stream
22    ts := TFileStream.Create(FileName, fmOpenRead);
23    with ts do
24    begin
25      //Get the File Size
26      fSize := Size;
27      try
28        //Move through the file a byte at a time
29        while (buf <> #13) do
30        begin
31          Seek(recLen, soFromBeginning);
32          read(buf, 1);
33          Inc(recLen);
34        end
35      finally
36        Free;
37      end;
38    end;
39    recLen := recLen + 2; //Need to account for CR/LF pair.
40    Result := Round(fSize / recLen);
41  end;


As I mentioned above, this may not be the "best" way to do this, but it is a way to 
approach this problem. A faster way to do this would have been to open up the file 
as a regular file, then read a bunch of bytes into a large buffer, let's say an 
Array of Char 4K in size. Perusing through an array is much faster than moving 
through a file, but the disadvantage there is that you run the risk of having the 
buffer too small. I've seen some fixed-length ASCII files with line sizes up to 8K.

In any case, the method I presented above may not be the most efficient, but it's 
safe, and it works. Besides, what's a few milliseconds worth to you? Have at it!

Wait a minute! 10:00PM

Okay, I couldn't resist. I realized that I could've done better than my example 
above. Here's the method I described immediately above:
42  
43  function GetTextFileRecords(FileName: string): Integer;
44  const
45    BlockSize = 8192;
46  var
47    F: file;
48    fSize,
49      amtXfer: Integer;
50    buf: array[0..BlockSize] of Char;
51  begin
52    AssignFile(F, FileName); //Open up the text file as an untyped file
53    Reset(F, 1);
54    fSize := FileSize(F); //Get the file size
55    BlockRead(F, buf, BlockSize, amtXfer); //read in up to an 8K block
56    CloseFile(F); //close the file, you're done
57    Result := Round(fSize / (Pos(#13, StrPas(buf)) + 1));
58  end;


There are several things different about this function as opposed to the function 
above. First of all, it involves a lot less code. This is due to not have to 
perform class constructor; I open up an untyped file, read a big block, get its 
size, then immediately close it. Notice too that I don't use a loop to find a #13. 
Instead, I use the StrPas function to convert the array of char into a string 
that's passed to the Pos function that will give me the position of the return 
character; thus the record length. Adding one to this value will account for the 
#10 portion of the CR/LF pair.

Because I don't have to deal with constructing an object, this method is a lot 
faster than method above, and amazingly it's not very complicated. Where this type 
of operation can get tricky is with the BlockRead function. In order to use 
BlockRead successfully, you need to specify a record size. That can be a bit 
confusing, so just remember this: for byte- by-byte serial reads through a file, 
always use a record size of 1. Also, notice that I also included a variable called 
amtXfer. BlockRead fills this with the actual number of bytes read. If you don't 
supply this, you'll raise an exception when BlockRead executes. That's not too much 
of a problem because all you need to do is create an exception handling block - but 
why bother? Just supply the variable, and you don't have to worry about the 
exception.

Okay, now it's time to close this out... Is this the best way to get the record 
length of a fixed length text file? Admittedly, it's one of the faster ways save 
using Assembler. But I'm wondering what a purely WinAPI call set would look 
like.... If you have any ideas, please make sure to let me know!

Here I Go Again! 11:05 PM

I guess my curiosity got the best of me tonight, because I just wasn't satisfied 
doing just the BlockRead method. I knew there had to be another way to do it with 
WinAPI calls. So I did just that. Look at the code below:

59  function GetTextFileRecordsWinAPI(FileName: string): Integer;
60  const
61    BlockSize = 8192;
62  var
63    F: THandle;
64    amtXFer,
65      fSize: DWORD;
66    buf: array[0..BlockSize] of Char;
67  begin
68    //Open up file
69    F := CreateFile(PChar(FileName), GENERIC_READ, FILE_SHARE_READ, nil,
70      OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL or FILE_FLAG_NO_BUFFERING, 0);
71    fSize := GetFileSize(F, nil); //Get the file's size
72    ReadFile(F, buf, BlockSize, amtXfer, nil); //Read a block from the file
73    CloseHandle(F);
74    Result := Round(fSize / (Pos(#13, StrPas(buf)) + 1));
75  end;


This method is almost exactly the same as the one immediately above, but instead 
uses WinAPI calls to accomplish the same task.

Now which method is better? I DON'T KNOW! Actually, for simplicity's sake, I prefer 
the elegance of the second method - there's just a lot less coding involved. With 
the WinAPI method, while it may require one less line of code, the CreateFile 
function is not the easiest thing to work with - I spent a bit of time Alt-Tabbing 
between the code editor and Windows help to get the syntax and constants right. 
Granted, it's easier now that I've done it, but it's not a method that I prefer.

So I'll leave it up to you to decide which method you like better.

			
Vote: How useful do you find this Article/Tip?
Bad Excellent
1 2 3 4 5 6 7 8 9 10

 

Advertisement
Share this page
Advertisement
Download from Google

Copyright © Mendozi Enterprises LLC