Delphi Clinic C++Builder Gate Training & Consultancy Delphi Notes Weblog Dr.Bob's Webshop
Dr.Bob's Delphi Notes Dr.Bob's Delphi Clinics
 Delphi Internet Solutions: File Formats (UUCode)
See Also: Dr.Bob's Delphi Internet Solutions on-line book

Internet file formats can be divided into a few groups. First, we have the file transfer (or communication) file formats, for which a long time ago the uuencode/decode schema was invented, followed by xxencode/decode. This later evolved into the base64 encoding and MIME messaging scheme that a lot of mailers use today. A second type of internet file formats is the Hyper Text Markup Language (HTML), with all its versions and (often browser specific) enhancements a true group in itself. The third group of internet file formats is more an interface or protocol of communication again; the Common Gateway Interface (CGI), of which we can identify standard (or console) CGI and Windows CGI or WinCGI.

Internet File Transfer
Delphi is extremely suited to write new components, and to illustrate uuencode/uudecode, xxencode/xxdecode and base64 encoding, we'll write a powerful component that implement these algorithms. The new component will implement the uuencode and uudecode file conversion algorithms that can be used to transfer files on the internet (previously used in unix-to-unix file transfers).
For a more sophisticated way of transfering files from one point to another, see the chapter about WININET and the FTP (File Transfer Protocol) component. The file transfer encoding algorithms presented here are mainly used in e-mail and newsgroup environments.

UUEncode and UUDecode
The objective of uuencoding is to encode a file which may contain any "binary" characters into another file with a standard "readable" (or printable) character set of 64 characters being: [`!"#$%&'()*+,-./0123456789:;<=>?@ABC...XYZ[\]^_], so that the encoded file can be reliably sent over diverse networks and e-mail gates. The 64 printable uuencode characters can be presented in a table as follows:

Files output by the uuencode algorithm consist of a header line, followed by a number of encoded body lines, and a trailer line. Any lines preceding the header line or following the trailer line will be ignored by the uudecode algorithm (as long as they don't contain the special "begin" or "end" keywords that distinguish a header and trailer line respectively).
The header line starts with the word "begin", followed by the filemode (an octal four digit number) and the name of the file which is encoded, separated by a whitespace. The trailer line consists of a single "end" on an otherwise empty line. The encoded body lines between the header and the trailer lines are at most 61 characters long: one character to specify the size, and at most 60 characters that hold the encoded data. The first character of each body line is one from the uuencode character set, and we get the actual number of data characters of the body line when the value of the character space ($20 or 32) is subtracted from the ASCII value of this character.
Body lines generally contain 60 data characters, which means 45 bytes encoded, so the first character is an 'M' (being the 45th character in the uuencode printable ASCII character set - see above). The actual data characters are best grouped by four characters. Three characters of the original input file (representing 3 * 8 = 24 bits) are encoded to four characters that each only consist of 6 bits, i.e. from a range of 0..64. Each encoded character then picks the character at the encoded 6-bits index in the uuencode character set. Since the uuencoded 64 character set is just the plain ASCII character set starting at position 33 and running to 64+32 = 96, we can just add the ASCII value of the character space to each 6-bits index value and get the corresponding uuencoded character.
The algorithm to convert a Triplet of binary characters (Array[0..2] of Byte) into a Kwartet of encoded (printable) characters (Array[0..3] of Byte) can be implemented in Object Pascal as follows:
  procedure Triplet2Kwartet(const Triplet: TTriplet;
                             var Kwartet: TKwartet);
  var
    i: Integer;
  begin
    Kwartet[0] := (Triplet[0] SHR 2);
    Kwartet[1] := ((Triplet[0] SHL 4) AND $30) +
                  ((Triplet[1] SHR 4) AND $0F);
    Kwartet[2] := ((Triplet[1] SHL 2) AND $3C) +
                  ((Triplet[2] SHR 6) AND $03);
    Kwartet[3] := (Triplet[2] AND $3F);
    for i:=0 to 3 do
      if Kwartet[i] = 0 then
        Kwartet[i] := $40 + Ord(SP)
      else Inc(Kwartet[i], Ord(SP))
  end {Triplet2Kwartet};
This routine consists of two parts: in the first part, the 24 bits (3 * 8) from the Triplet is spread out over the 24-bits (4 * 6) of the Kwartet. In the second part of the algorithm, we add the ASCII value of the space character to each Kwartet. The ASCII space character is coded as Ord(SP), where SP is defined as the space character or #32. Note that in case the Kwartet has a value of zero, we don't just add the space character to it. That would mean that the encoded character is a whitespace, and many mailers have trouble sending multiple whitespaces or trailing whitespaces on body lines. Hence, in those cases the value 64 (or $40) is also added to the Kwartet, resulting not in a whitespace but in the back-quote ` character. The value $40 will be neutralized by the uudecode algorithm, so it doesn't really matter to a decent uudecoder whether or not we've used a whitespace or a back-quote at all.
Speaking of uudecode, the Object Pascal implementation of the algorithm to convert a Kwartet of encoded characters back to a Triplet of original binary characters is as follows:
  procedure Kwartet2Triplet(const Kwartet: TKwartet;
                            var Triplet: TTriplet);
  var
    i: Integer;
  begin
    Triplet[0] :=  ((Kwartet[0] - Ord(SP)) SHL 2) +
                  (((Kwartet[1] - Ord(SP)) AND $30) SHR 4);
    Triplet[1] := (((Kwartet[1] - Ord(SP)) AND $0F) SHL 4) +
                  (((Kwartet[2] - Ord(SP)) AND $3C) SHR 2);
    Triplet[2] := (((Kwartet[2] - Ord(SP)) AND $03) SHL 6) +
                   ((Kwartet[3] - Ord(SP)) AND $3F)
  end {Kwartet2Triplet};
If the size of a Triplet in the file to encode (or Kwartet in the file to decode) is not exact the 3 Byte size of the Triplet (or 4 Byte size of the Kwartet), then zero's are added as extra characters to fill the structure to encode or decode.

XXEncode and XXDecode
Uuencoding has been the most popular form of base64 encoding. Limitations of uuencoding is that character sets do not translate well between ASCII and EBCDIC (IBM mainframe). Xxencoding is very similar to uuencoding, but only uses a different character set than uuencoding so that character set translations will work better across multiple types of systems, i.e. between the above specified EBCDIC and ASCII.

Note that while the uuencoding character set was a subrange of the ASCII character set (from 32 to 96), this is not true for the xxencoding character set. In order to modify the Triplet2Kwartet and Kwartet2Triplet routines to support the xxencoding character set as well, we need to add an array of 0..63 characters to represent the xxencoding character set. We also need to modify the Triplet2Kwartet and Kwartet2Triplet routines themselves, which results in the following code:
  const
    XX: Array[0..63] of Char =
       '+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';

  procedure Triplet2Kwartet(const Triplet: TTriplet;
                            var Kwartet: TKwartet);
  var
    i: Integer;
  begin
    Kwartet[0] := (Triplet[0] SHR 2);
    Kwartet[1] := ((Triplet[0] SHL 4) AND $30) +
                  ((Triplet[1] SHR 4) AND $0F);
    Kwartet[2] := ((Triplet[1] SHL 2) AND $3C) +
                  ((Triplet[2] SHR 6) AND $03);
    Kwartet[3] := (Triplet[2] AND $3F);
    for i:=0 to 3 do
      if Kwartet[i] = 0 then
        Kwartet[i] := $40 + Ord(SP)
      else Inc(Kwartet[i],Ord(SP));
    if XXCode then
      for i:=0 to 3 do
        Kwartet[i] := Ord(XX[(Kwartet[i] - Ord(SP)) mod $40])
  end {Triplet2Kwartet};
The last few lines of the Triplet2Kwartet routine are new, and use the XX character set to return the right encoded character. Remember that the uuencoding algorithm returns the index of the encoded character, after which we add the value of a whitespace, so if the xxencode algorithm is performed after the general uuencode algorithm, we must subtract the value of whitespace again and use the remainder as index in the XX character array.
The same change is true for the Kwartet2Triplet routine, where we must compensate for the xxencoded characters before the uudecoding algorithm can take place (note that we can no longer pass Kwartet as a const argument):
  procedure Kwartet2Triplet(Kwartet: TKwartet;
                            var Triplet: TTriplet);
  var
    i: Integer;
  begin
    if XXCode then
    begin
      for i:=0 to 3 do
      begin
        case Chr(Kwartet[i]) of
              '+': Kwartet[i] := 0 + Ord(SP);
              '-': Kwartet[i] := 1 + Ord(SP);
         '0'..'9': Kwartet[i] := 2 + Kwartet[i]
                                   - Ord('0') + Ord(SP);
         'A'..'Z': Kwartet[i] := 12 + Kwartet[i]
                                    - Ord('A') + Ord(SP);
         'a'..'z': Kwartet[i] := 38 + Kwartet[i]
                                    - Ord('a') + Ord(SP)
        end
      end
    end;
    Triplet[0] :=  ((Kwartet[0] - Ord(SP)) SHL 2) +
                  (((Kwartet[1] - Ord(SP)) AND $30) SHR 4);
    Triplet[1] := (((Kwartet[1] - Ord(SP)) AND $0F) SHL 4) +
                  (((Kwartet[2] - Ord(SP)) AND $3C) SHR 2);
    Triplet[2] := (((Kwartet[2] - Ord(SP)) AND $03) SHL 6) +
                   ((Kwartet[3] - Ord(SP)) AND $3F)
  end {Kwartet2Triplet};
Note that in the new versions of the above two routines a global boolean variable "XXCode" is used to determine whether or not we're performing the xxencoding/decoding or the plain uuencoding/decoding algorithm.

Base64
The base64 encoding algorithm is different from the uuencode and xxencode algorithms, in that no first "count" character is used on the body lines. It is similar to the uuencode and xxencode algorithms in that it converts Triplets into Kwartets using a 64 printable character conversion table.

Like the xxencoding character set, the base64 character set is not a valid ASCII character set subrange like the uuencoding character set. This means that we have to include an array of 0..63 to represent the base64 character set and must also modify the Triplet2Kwartet and Kwartet2Triplet routines again to support the base64 encoding algorithm:
  const
    B64: Array[0..63] of Char =
       'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';

  procedure Triplet2Kwartet(Const Triplet: TTriplet;
                            var Kwartet: TKwartet);
  var
    i: Integer;
  begin
    Kwartet[0] := (Triplet[0] SHR 2);
    Kwartet[1] := ((Triplet[0] SHL 4) AND $30) +
                  ((Triplet[1] SHR 4) AND $0F);
    Kwartet[2] := ((Triplet[1] SHL 2) AND $3C) +
                  ((Triplet[2] SHR 6) AND $03);
    Kwartet[3] := (Triplet[2] AND $3F);
    for i:=0 to 3 do
      if Kwartet[i] = 0 then
        Kwartet[i] := $40 + Ord(SP)
      else Inc(Kwartet[i],Ord(SP));
    if Base64 then
      for i:=0 to 3 do
        Kwartet[i] := Ord(B64[(Kwartet[i] - Ord(SP)) mod $40])
    else
      if XXCode then
        for i:=0 to 3 do
          Kwartet[i] := Ord(XX[(Kwartet[i] - Ord(SP)) mod $40])
  end {Triplet2Kwartet};

  procedure Kwartet2Triplet(Kwartet: TKwartet;
                            var Triplet: TTriplet);
  var
    i: Integer;
  begin
    if Base64 then
    begin
      for i:=0 to 3 do
      begin
        case Chr(Kwartet[i]) of
         'A'..'Z': Kwartet[i] := 0 + Kwartet[i]
                                   - Ord('A') + Ord(SP);
         'a'..'z': Kwartet[i] := 26+ Kwartet[i]
                                   - Ord('a') + Ord(SP);
         '0'..'9': Kwartet[i] := 52+ Kwartet[i]
                                   - Ord('0') + Ord(SP);
              '+': Kwartet[i] := 62+ Ord(SP);
              '/': Kwartet[i] := 63+ Ord(SP);
        end
      end
    end
    else
    if XXCode then
    begin
      for i:=0 to 3 do
      begin
        case Chr(Kwartet[i]) of
              '+': Kwartet[i] := 0 + Ord(SP);
              '-': Kwartet[i] := 1 + Ord(SP);
         '0'..'9': Kwartet[i] := 2 + Kwartet[i]
                                   - Ord('0') + Ord(SP);
         'A'..'Z': Kwartet[i] := 12 + Kwartet[i]
                                    - Ord('A') + Ord(SP);
         'a'..'z': Kwartet[i] := 38 + Kwartet[i]
                                    - Ord('a') + Ord(SP)
        end
      end
    end;
    Triplet[0] :=  ((Kwartet[0] - Ord(SP)) SHL 2) +
                  (((Kwartet[1] - Ord(SP)) AND $30) SHR 4);
    Triplet[1] := (((Kwartet[1] - Ord(SP)) AND $0F) SHL 4) +
                  (((Kwartet[2] - Ord(SP)) AND $3C) SHR 2);
    Triplet[2] := (((Kwartet[2] - Ord(SP)) AND $03) SHL 6) +
                   ((Kwartet[3] - Ord(SP)) AND $3F)
  end {Kwartet2Triplet};
Note that in the new versions of the above two routines a new global boolean variable "Base64" is used to determine whether or not we're performing the base64encoding/decoding or the uu/xx-encoding/decoding algorithm.

MIME
MIME stands for Multipurpose Internet Mail Extensions, which is the latest international standard for base64 encoding. It was designed to handle multiple language support and character translations across multiple types of systems (such as IBM mainframes, UNIX systems, and Macintosh and IBM PC's). MIME is an encoding algorithm described in RFC1341 as MIME base64. Like uuencode, the purpose of MIME is to encode binary files into ASCII so that they may be passed through e-mail gates, and MIME uses the base64 algorithm for that, plus a set of additional keywords and options which can be used to specify more detailed information about the contents of the encoded document.
MIME will be covered in some more detail in the MAIL chapter by John Kaster.

TBUUCode Component
The interface definition of the entire TUUCode component, based on the previously defined and underlying Triplet2Kwartet and Kwartet2Triplet routines, is defined as follows (note that this code compiles with all versions of Delphi and C++Builder):

  unit UUCode;
  interface
  uses
  {$IFDEF WIN32}
    Windows,
  {$ELSE}
    WinTypes, WinProcs,
  {$ENDIF}
    SysUtils, Messages, Classes, Graphics, Controls, Forms;

  {$IFNDEF WIN32}
  type
    ShortString = String;
  {$ENDIF}

  type
    EUUCode = class(Exception);

    TAlgorithm = (filecopy, uuencode, uudecode,
                            xxencode, xxdecode,
                            base64encode, base64decode);
    TUnixCRLF = (CRLF, LF);

    TProgressEvent = procedure(Percent:Word) of Object;

    TBUUCode = class(TComponent)
    public
    { Public class declarations (override) }
      constructor Create(AOwner: TComponent); override;

    private
    { Private field declarations }
      FAbout: ShortString;
      FActive: Boolean;
      FAlgorithm: TAlgorithm;
      FFileMode: Word;
      FHeaders: Boolean;
      FInputFileName: TFileName;
      FOutputFileName: TFileName;
      FOnProgress: TProgressEvent;
      FUnixCRLF: TUnixCRLF;
    { Dummy method to get read-only About property }
      procedure Dummy(Ignore: ShortString);

    protected
    { Protected Activate method }
      procedure Activate(GoActive: Boolean);

    public
    { Public UUCode interface declaration }
      procedure UUCode;

    published
    { Published design declarations }
      property About: ShortString read FAbout write Dummy;
      property Active: Boolean read FActive write Activate;
      property Algorithm: TAlgorithm read Falgorithm
                                     write FAlgorithm;
      property FileMode: Word read FFileMode write FFileMode;
      property Headers: Boolean read FHeaders write FHeaders;
      property InputFile: TFileName read FInputFileName
                                    write FInputFileName;
      property OutputFile: TFileName read FOutputFileName
                                     write FOutputFileName;
      property UnixCRLF: TUnixCRLF read FUnixCRLF write FUnixCRLF;

    published
    { Published Event property }
      property OnProgress: TProgressEvent read FOnProgress
                                          write FOnProgress;
    end {TUUCode};

Properties
The TUUCode component has eight published properties (we skip the event property for now):
The About property contains the copyright and version information.
The Active property can be used to call the UUCode conversion method at design time, similar to the Active property of TTables and TQuery components.
The Algorithm property contains the specific algorithm to be executed by the UUCode conversion method. The following algorithms are implemented and supported by the TUUCode component:


The FileMode property contains the hexadecimal filemode (usually 0644 or 0755). Note that the filemode is specified using decimal digits.
The Headers property can be used to specify whether or not begin-end headers should be generated (by the encoding algorithm) or expected (by the decoding algorithm). By default this property is set to True.
The InputFile property contains the name of the input file to be encoded or decoded.
The OutputFile property contains the name of the output file which is the result of an encoding algorithm. Note that the OutputFile property is ignored when decoding an InputFile that has headers (which specify the name of the file to decode).
The UnixCRLF property is used to specify Unix-style LineFeed-only usage or DOS/Windows-style Carriage Return/Line Feed pairs. Default is CRLF, but at least we're able to generate encoded files for Unix systems and decode files which originate from Unix systems.

Methods
The TUUCode component has three methods; one public constructor, one protected method and one public method:
The public constructor Create is used to create the component and initialize the default property values of Active, FileMode, Headers and About.
The protected method Activate is used to call the public method UUCode at design-time when we set the Active property to True. It is never necessary to call this method itself directly, since it's much easier to call the public method UUCode.
The public method UUCode is where the encoding and decoding algorithms are actually performed on the InputFile, based on the values of the other properties of the TUUCode component.

Events
The TUUCode component has one event property:
The OnProgress event can be used as callback function to let the TUUCode component specify the current percentage of the InputFile which is encoded or decoded. Using this information, we can use a 16-bits TGauge or 32-bits TProgressBar component, for example, to show the progress of the encoding or decoding process from 0 to 100%.

Encoding of decoding large documents may take up some time, even if you have a fast machine and hard disk. Therefore, it could come in handy to have the ability to show the progress of the encoding and decoding process. To implement this, we need to create a new event, OnProgress, an event signaler and a corresponding event hander.
Events consist of two parts: an event signaler and the event handler. The signaler must make sure that the component somehow gets a message of some sort to indicate that some condition has become true, and that the event is now born. The event handler, on the other hand, starts to work only after the event itself is generated, and responds to it by doing some processing of itself.
Event signalers are typically based on virtual (or dynamic) methods of the class itself (like the general Click method) or Windows messages, such as notification or callback messages. Event handlers are typically placed in event properties, such as the OnClick, OnChange or OnProgress event handler property. If event handlers are published, then the user of the component can enter some event handling code that is to be executed when the event is fired.

Event Handlers
Event Handlers are methods of type Object. This means that they can be assigned to class methods, and not to ordinary procedures or functions (the first parameter must be a Self type of thing). Consider the type TNotifyEvent for the most general of event handlers:

  TNotifyEvent = procedure(Sender: TObject) of object;
The TNotifyEvent type is the type for events that have only the sender as parameter. These events simply notify the component that a specific event occurred at a specific TObject (the sender). For example, OnClick, which is type TNotifyEvent, notifies the control that a click event occurred on the control Sender. If the parameter Sender would be omitted as well, then we'd only know that a specific event had occurred, but not to which control. Generally, we do want to know for which control the event just occurred, so we can act on the control (or on data in the control).
As mentioned before, Event Handlers are placed in event properties, and they appear on a separate page in the Object Inspector (to distinguish them from the 'normal' properties). The basis on which the Object Inspector decides to split these two kinds of properties is the "procedure/function of Object" part. The "of Object" part is needed, since we get the error message "cannot publish property" if we (try to) omit it.
The TUUCode component needs an TProgressEvent. This event does not really need the sender (this can always be added later on), but does need the percentage of the completed process so far, which yields the following prototype:
  TProgressEvent = procedure(Percent: Word) of object;

Event Signalers
Event signalers are needed to signal to an event handler that a certain event has occurred, so the event handler can perform its action. Event signalers are typically based on virtual (or dynamic) methods of the class itself (like the general Click method) or Windows messages, such as callback or notification messages.
In case of the TUUCode component, the event signaler is integrated within the UUCode method itself. After each line of encoded characters, the OnProgress event is called, if one is present. In code, this is implemented as follows:

  if Assigned(FOnProgress) then
    FOnProgress(trunc((100.0 * Size) / OutputBufSize))
Where Size is the current size (or position) of the output buffer that is already encoded or decoded, and OutputBufSize is the expected total filesize of the outputfile. Size will grow from zero to OutputBufSize, which means that the FOnProgress event handler is called with an argument between 0 and 100.

Registration
When registering the TUUCode component, it helps to add a design-time property editor for the FileName (of the InputFile) to add a little bit more support for the end-user. This property editor is implemented in the same UUReg unit that registers the TUUCode component in the Delphi Component Palette:

  unit UUReg;
  interface
  {$IFDEF WIN32}
    {$R UUCODE.D32}
  {$ELSE}
    {$R UUCODE.D16}
  {$ENDIF}
  uses
    DsgnIntf;

  type
    TFileNameProperty = class(TStringProperty)
    public
      function GetAttributes: TPropertyAttributes; override;
      procedure Edit; override;
    end;

  procedure Register;

  implementation
  uses
    UUCode, { TUUCode }
    Classes, Dialogs, Forms, SysUtils;

    function TFileNameProperty.GetAttributes: TPropertyAttributes;
    begin
      Result := [paDialog]
    end {GetAttributes};

    procedure TFileNameProperty.Edit;
    begin
      with TOpenDialog.Create(Application) do
      try
        Title := GetName; { name of property as OpenDialog caption }
        Filename := GetValue;
        Filter := 'All Files (*.*)|*.*';
        HelpContext := 0;
        Options := Options +
                  [ofShowHelp, ofPathMustExist, ofFileMustExist];
        if Execute then SetValue(Filename);
      finally
        Free
      end
    end {Edit};

    procedure Register;
    begin
      { component }
      RegisterComponents('DrBob42', [TUUCode]);
      { property editor }
      RegisterPropertyEditor(TypeInfo(TFilename), nil,
                            'InputFile', TFilenameProperty);
    end {Register};
  end.
If we want to use Delphi packages to "package" the TUUCode component (pun intended), then we should put the unit UUCode in a runtime package, and the unit UUReg in a design-time package (that requires the runtime package). In fact, using packages we can even put the UUCode Wizard from next section in the design-time package and have it available in the IDE of Delphi to all users!

UUCode Example Wizard
The 16-bit example program uses a TGauge component to show the progress of the conversion algorithm, while the 32-bit version uses a Windows 95 Progress Control.


Figure 1: 16-bit Version of the UUCode Example Program


Figure 2: 32-bit Version of the UUCode Example Program

There are two possible exceptions that can be raised by the example programs: if the inputfilename is empty, and - when encoding - if the outputfilename is empty. The 16-bits version can raise a third exception if the inputfile or outputfile is bigger than 65000 characters (the 16-bits version of this component can only handle input and output files up to 64Kb in size. In practice, this means that the input file cannot be bigger than approximately 48Kb. The 32-bit version has no such limitation, of course).

Summary
In this chapter, we've explored the uuencode/uudecode, xxencode/xxdecode, and base64 encode/decode algorithms. We developed a single VCL component that supports all these algorithms in addition to the simple filecopy. Properties, methods and events make this component a valuable tool in building internet applications that require these specific file conversions.
The TBUUCode component is now part of the DrBob42 component package for Delphi and C++Builder.


This webpage © 1997-2017 by Bob Swart (aka Dr.Bob - www.drbob42.com). All Rights Reserved.