Content Sucking using FTP - A Neat Odd Job

By: Randall Nagy

Abstract: We all have those neat odd jobs laying about our project. Things that you would like to develop, but you simply do not have the time. This article reviews how to use an open toolset dedicated to rapidly creating platform independent tools.

Neat Odd Job #4526: Content Sucking using FTP

The need to create single-minded utilities and tools has spawned just as many new scripting technologies as object orientation has spawned new language technologies. For this reason, developers writing major applications in compiled languages (like Java, C++, or Object Pascal) often rely upon scripting, batch, or tool languages to do chores like downloading a huge number of files, automating file distribution, or cleaning up data.

While technologies like Perl, Python, shell and batch scripting will always be important, I have found that even when being moderately proficient in these technologies that most developers would prefer to write tools using their favorite compiled language. Because C++ developers are estimated by some to be the largest group of developers in the world today, I put together a library of tool classes for C++ developers. (Warning - This was a 3F exercise - See the Lone Wolf Tao series on this site for more information.)

Know as the "Neat Odd Job" (NOJ) Project, the tool support library has developer-domain business objects that encapsulate such common chores as file searching / recursion, program identification, portable path operations, template searching and sorting, imbedded database, and many more. In addition to an open-ended tool set, the NOJ library also offers a philosophy that should be familiar to tool developers everywhere.

 

In Search of Content

Having been a UNIX enthusiast since the days of CP/M, I quite naturally like to keep up with open source projects. For this reason, I am always looking for open-source POSIX tools to run on the platform that I sell most of my products on (Windows). Tools (that like the NOJ Library) allow me to tweak the source code, rather than waiting for bugs to be fixed by someone else.

The list of POSIX tools and utilities is impressive. So impressive, in fact, that is can be virtually impossible to keep up with by hand. The following is a NOJ that I wrote to help me keep up with my favorite projects. In the spirit of the tool developer, I offer it up in the hope that you might find it useful for keeping up with yours.

 

Building your NOJ

Most of the NOJ library resides in the `stdnoj` namespace. For this reason, most of the code that you will want to include in your project resides in noj folder under the stdnoj directory. (Because a core technology is likely to be re-used a lot, it is very important that the structure of the code base be easy to understand.) Once included, adding the appropriate using declaration and defining a platform identifier (WIN32, UNIX, or DOS) is all that you need do to begin experimenting with the NOJ library.

 

Writing to the LCD

Because time spent writing tools and utilities is time spent away from more profitable activities, when developing them we need to get things done fast.

In the case of developing a mass content downloader on Microsoft Windows, FTP is the protocol of choice of the moment. While source code for FTP would be relatively easy to implement over the NOJ communications infrastructure as some point (located in the Server folder), the quickest way to develop the interface is to re-use the robust and fully debugged capabilities of Microsoft's WinINET support libraries. So the Least Common Denominator for my NOJ was to write a tool that would download a huge amount of files using FTP on Windows.

Because we might implement a platform independent FTP protocol at some point, our venture into vendor lock city begins by designing an implementation that encapsulates the intricacies of using WinINET. Indeed, if the model proves successful, we might even inform other about the capability, allowing them to complete it for the platform(s) where they need to run our NOJ. (When it comes to writing tools and utilities in C/C++, only the coolest survive.)

 

#include <stdnoj.hpp> 

using namespace stdnoj;

class InetFTP
{
protected:
   void *hInternet;
   void *hConnect;

   bool _Nodes(Array& aFiles, bool bDirsOnly);

public:
   InteFTP(void);
   ~InetFTP(void);

   bool OpenHost(const StdString& sUrl, const StdString& sUser, const StdString& sPass);
   void CloseHost(void);

   bool ChangeDir(const StdString& sDir);

   bool Files(Array<File>& aFiles);        // Files (only)
   bool Folders(Array<Directory>& aFiles);      // Folders (only)

   bool GetFile(const StdString& sHostNode, Directory& LocalDir, bool bText = false);
   bool GetFile(const StdString& sHostNode, File& LocalFile, bool bText = false);

   bool PutFile(File& fileLocal, const StdString& sHostNode, bool bText = false);
   bool PutFile(File& fileLocal, bool bText = false);   // File, by node name, to default host folder.
};

 

While the implementation of this interface is important, it is not of any great concern here. Feel free to download the source if you want to check it out.

What is more important is the driver that we use to accomplish the task at hand. Because this portion of the code is what we will be using to get others interested in our tool, it needs to be clean as well as demonstrate a basic utility for the purpose at hand.

 

void main(int argc, char *argv[])
   {
   bool bDone = false;
   FtpSite acct;

   FtpCollect site;
   StdLog log;
   site.SwitchAccount(acct);
   if(site.GetFirstBatch(acct, bDone, log) == false)
      {
      cerr << "Directory not found" << endl;
      return;
      }

   while(bDone == false)
      {
      site.SwitchAccount(acct);
      if(site.GetNextBatch(acct, bDone, log) == false)
         {
         cerr << "Directory not found" << endl;
         return;
         }
      }
   cout << "Normal end of job!" << endl;
   }

 

Modifying the Design

When I first wrote this project, I was pretty sure that a host would not allow me to download several thousand files at one time. While in years past this type of mass content downloading ("sucking") was fine, today the host loads simply cannot tolerate too many mass download ("MGET *") attempts. While I originally designed the code to download from a single server, I soon discovered that by switching hosts several times I was allowed to distribute the file demand across multiple FTP sessions and servers.

 

      virtual void FtpCollect::SwitchAccount(FtpSite& acct)
         {
         static int cookie = 0;
         // Because of logging, we will need to change servers.
         // This is a default;
         acct.sSite = "unc.dl.sourceforge.net";
         acct.sUser = "anonymous";
         acct.sPassword = "yourid@yourhost.com";
         acct.sFolder = "/pub/sourceforge/gnuwin32";
         switch(cookie)
            {
            case 0:
               acct.sSite = "unc.dl.sourceforge.net";
               cookie++;
            break;
            case 1:
               acct.sSite = "heanet.dl.sourceforge.net";
               acct.sFolder = "gnuwin32";
               cookie++;
            break;
            case 2:
            default:
               acct.sSite = "umn.dl.sourceforge.net";
               cookie = 0;
            break;
            }
         }

 

Conclusion

So now we arrive at the conclusion of this neat odd job. Not only were we able to deliver the utility in record time with a very few lines of code, but we demonstrated how to design a project so as to maximize code re-use, as well as to speed up project delivery by availing ourselves of less portable platform conventions. The very essence of an easier to understand technology designed for helping us grow neat tools to share with others, no matter where they need the performance benefits of deploying native code.

 

About the Author

Randall Nagy has been serving the computing industry since 1978. An experienced principal, author, trainer, and consulting geek for hire, he has worked with such firms as Borland, IBM, AT&T, Informix and UTL, The State of Connecticut, Imperial Oil of Canada, Southern California Gas, and more.

Either he or his Lone Wolf associates stand ready to accept your consulting assignments. His web site is Soft9000.com.


Server Response from: ETNASC03