What is this TParser thing and how do I use it?

By: mykle hoban

Abstract: This article explains the undocumented class TParser and implements a simple tokenizer using it.


I noticed this class called TParser in classes.hpp. It's undocumented but it looks intriguing. What does it do and how do I use it?


TParser is a class that is used internally by the Delphi and C++Builder IDEs to parse DFM (form) files into a binary format. This limits it a little in its everyday usage (as it is looking for correctly formatted, text-form DFM files), but can still help out a lot. TParser is perhaps a bit inappropriately named. It should probably be called TLexer or something similar that is more suited to its function. TParser, unlike other parsers (by definition), does not process data grabbed from a file; it merely breaks it up into tokens (like a lexer, hence the misnomer). Here is its definition from Classes.hpp:

class DELPHICLASS TParser;
class PASCALIMPLEMENTATION TParser : public System::TObject 
  typedef System::TObject inherited;
  TStream* FStream;
  int FOrigin;
  char *FBuffer;
  char *FBufPtr;
  char *FBufEnd;
  char *FSourcePtr;
  char *FSourceEnd;
  char *FTokenPtr;
  char *FStringPtr;
  int FSourceLine;
  char FSaveChar;
  char FToken;
  char FFloatType;
  WideString FWideStr;
  void __fastcall ReadBuffer(void);
  void __fastcall SkipBlanks(void);
  __fastcall TParser(TStream* Stream);
  __fastcall virtual ~TParser(void);
  void __fastcall CheckToken(char T);
  void __fastcall CheckTokenSymbol(const AnsiString S);
  void __fastcall Error(const AnsiString Ident);
  void __fastcall ErrorFmt(const AnsiString Ident, const System::TVarRec * Args, const int Args_Size);
  void __fastcall ErrorStr(const AnsiString Message);
  void __fastcall HexToBinary(TStream* Stream);
  char __fastcall NextToken(void);
  int __fastcall SourcePos(void);
  AnsiString __fastcall TokenComponentIdent();
  Extended __fastcall TokenFloat(void);
  __int64 __fastcall TokenInt(void);
  AnsiString __fastcall TokenString();
  WideString __fastcall TokenWideString();
  bool __fastcall TokenSymbolIs(const AnsiString S);
  __property char FloatType = {read=FFloatType, nodefault};
  __property int SourceLine = {read=FSourceLine, nodefault};
  __property char Token = {read=FToken, nodefault};

In our example, we will use TParser to break a text file up into tokens (words). Because of the nature of TParser, this is not necessarily the best application, but it does illustrate the concepts.

Here are the methods and properties of TParser that we need to concern ourselves with:

char __fastcall NextToken(void);
  Tells us the type of the next token
int __fastcall SourcePos(void);
  Tells us what position in the source file we're at.
AnsiString __fastcall TokenString();
  Tells us what the current token is (returns as a string).
  (Other functions such as TokenInt return as different types.
__property int SourceLine = {read=FSourceLine, nodefault};
  Tells us what line in the source file we're at.
__property char Token = {read=FToken, nodefault};
  Tells us the type of the current token.

Here is the code for the tokenizer. It should be pretty straight-forward. It goes through an input stream (file), tokenizes it, and dumps the results into a Memo.

void __fastcall TForm1::Button1Click(TObject *Sender)
  TFileStream *fs=new TFileStream(Edit1->Text,fmOpenRead);
  fs->Position = 0;
	//dump the original file
  fs->Position = 0;
  TParser *theParser=new TParser(fs);
  while (theParser->NextToken() != toEOF) //while we're in the file
    //Get Token
    AnsiString str=theParser->TokenString();
    //Get the position in the stream
    int Pos=theParser->SourcePos();
    //Get the line number
    int Line=theParser->SourceLine;
    //Get token type
      case toSymbol:
        Memo1->Lines->Add(str+" is a symbol at line : "+Line+"   position : "+Pos);
      case toInteger:
        Memo1->Lines->Add(str+" is an integer at line : "+Line+" position : "+Pos);
      case toFloat:
        Memo1->Lines->Add(str+" is a float at line : "+Line+" position : "+Pos);
      case toString:
      //note: TParser is designed for DFM's so that toString only works with
      //'single quoted' strings
        Memo1->Lines->Add(str+" is a string at line : "+Line+" position : "+Pos);
  delete fs;
  delete theParser;

Server Response from: ETNASC04