Short intro note from Peter Coad
Hi There,
I wanted to send you a sample issue, for your consideration. Here it is: the
first issue from the all-new site (see below).
Oh what fun we are going to have! I hope you'll
join me in this great adventure.
Very best,
Peter Coad
Editor-in-Chief, The Coad Letter
Chairman, Founder, and Chief Strategy Officer, TogetherSoft Corporation
Lists Considered Harmful
Issue 88, March 2002
Dear Friend,
Welcome to the first issue of the Modeling and Design Edition of the newly
reorganized The Coad Letter. Over the coming months, I'll be sharing
with you insights, strategies, patterns and techniques for creating better software.
We kick off this month with a look at the impact of lists on the usability
and scalabilty of software systems. How many times have you had to wait forever
for a system to display a long list of things that you then had to scroll through
looking for the particular item you want? We look at three strategies for better
handling of large lists.
Have fun
Steve
Stephen Palmer (stephen@thecoadletter.com)
PS #1. Yes, you may have heard: I have returned to independent consulting
... specializing in object-oriented analysis, design, programming and process.
Contact me via http://www.step-10.com for
short or longer term assignments. Peter Coad says I am one of the best designers
he has ever worked with. Give me the chance to show you why. Let me help your
team build better object-oriented software. More details at http://www.step-10.com.
PS #2. Together ControlCenter 6.0 is officially released! UI builder,
10 new refactorings (total 12), testing, and more. Download today from
http://www.togethersoft.com/downloads/
Lists Considered Harmful
User extendible lists or large fixed-sized lists can seriously affect the performance
of a software system. Loading the entries of a large list from disk or over
a network can be a time consuming operation. Keeping the contents of large lists
in memory can increase the amount of memory required to run an application or
result in the operating system page swapping memory to disk (again slowing performance).
If the contents of a list are more complex objects than just simple character
strings the problem is worse.
You might think that such performance problems would be identified early in
a system's development and appropriate solutions found. However, most developers
use small sets of data for unit and integration testing. This means that a performance
problem with a large list might not be noticed until formal system testing.
By then project deadlines may mean there is not enough time to correct the problem.
When a list's contents can be extended by users of the system, a performance
problem may not be noticed until the system has been installed and running for
months or until a project of large enough size is attempted.
Therefore, when it comes to working with lists, it is definitely a case of
think first before coding the simplest solution that comes to mind. In this
situation the simplest solution may actually be too simple and affect the ability
of the software to scale to truly large sets of users and large projects. For
software product vendors, problems like this can seriously affect the ability
to sell a software product into large companies.
The following strategies can be used to provide better designs for manipulating
larger and growing lists of things. None of the strategies are new or earth
shattering; the point is to consider them when first designing the manipulation
of a large list and pick one or a combination.
Strategy 2001-03-01: Categorize list entries and present as a tree structure
Place the entries of a large list into a set of mutually exclusive categories.
Instead of presenting the user with a single long list of items, ask the user
to select a category first and then display only those entries within that category.
Notes:
- Most graphical user interface (GUI) toolkits provide a tree control that
can be used to do this concisely; categories are parent nodes in the tree
and the items are leaf nodes.
- Most GUI tree controls provide a means of showing and hiding the children
of a node. Only those entries under nodes that a user 'expands' need be loaded
often reducing the amount of memory used significantly.
- Enabling the user to redefine the categories and move items from one category
to another allows a user to organize the items to better meet their own specific
needs.
Examples:
- The filesystem on many computers (e.g. Unix, Mac and Windows) use directories
or folders to present large numbers of individual files within a tree structure.
- Many email clients use folders to organize a large number of messages as
a tree.
- TogetherSoft's Together ControlCenter presents its list of automated design
patterns as a tree control categorized by the patterns applicability or origin
(e.g.. J2EE patterns, Gang of Four Design patterns, user interface patterns,
Coad class archetypes, etc.).
Advantages:
- This strategy usually offers a reasonably straightforward refactoring by
replacing a list control with a tree control. However, it is definitely more
efficient to decide to use a tree first than code a list and then replace
it later.
- Provides an easy way for a user to locate a particular item if the category
of the item is known.
Disadvantages:
- Makes it much harder to find a particular item if the category is not known.
- As more and more items are added, the categories need to be reorganized
and extra levels added.
- A tree structure only provides a single categorization scheme; an item belongs
to one and only one category. Extensions such as links and shortcuts to other
categories can be used to relieve this constraint but at the cost of significant
additional complexity.
This strategy does not really solve the underlying problem of a growing list;
it only helps ease the pain for a while. Imagine a system that listed all employees.
In a rapidly growing company a simple list might be sufficient for the first
couple of years. When the company has a few hundred employees a tree control
might provide a good enough mechanism for locating a particular employee. However
for a multi-national company neither a simple list or tree is going to suffice.
Strategy 2001-03-02: Replace a simple list with a search operation
Instead of presenting all the items in one long, alphabetically sorted list,
provide the user with the ability to search for a small subset of items using
a set of criteria.
Notes:
- In business systems it is important to select a useful set of search criteria;
talk to the user representatives and, if possible, watch how they currently
locate these items in their daily work.
- If search criteria can be saved, a user can build up a set of useful searches
so that criteria does not have to be remembered and re-entered each time.
Examples:
- Filesystems on many computers (e.g. Unix, Mac and Windows) provide a file
search capability in addition to the hierarchical categorization of directories
or folders.
- Many business systems provide a search facility for locating customers details.
- Many e-commerce sites provide a search facility for quickly locating a particular
product.
Advantages:
- Users can use multiple criteria to try to locate a particular item.
- When backed by indexing techniques a search can be a much faster way to
locate a particular item.
- Searching is one of the most established branches of computer science so
an efficient algorithm is usually readily available.
Disadvantages:
- Poor selection of search criteria can still result in a large list of items
being presented to the user or the particular item not being found.
- Poor implementation can make searching very expensive and time consuming
operation.
Strategy 2001-03-03: Enable a user to define, name, save and load subsets
Enable a user to specify a subset of the whole list, give that subset a name,
save it and then load it as and when desired.
Notes:
- In a distributed system, designers need to choose between storing the subsets
on the client or on the server. Storing on the server usually requires more
work but subsets can be shared between users.
- Storing on the client often means faster retrieval but useful subsets cannot
be shared between users as easily.
Examples:
- The list of possible stereotypes in UML is growing daily. For a tool vendor
to present in a list all the possible stereotypes an element can take is rapidly
becoming impractical. UML Profiles provide a mechanism where a subset of stereotypes
can be named, saved and loaded so that a user only loads the subset of stereotypes
relevant to the task on which they are working.
- A bank manager might save a subset of his most important account holders.
- A planner might save a subset of possible tasks representing a particular
process.
Advantages:
- Users only load the items they need to work with reducing the amount of
items that need to be loaded from disk or across a network into memory.
- Unlike saving a set of search criteria, there is no potentially expensive
operation to be performed before the set of items is presented to the user.
- If there are many lists to which this strategy can be applied, then these
sets can themselves be collected together into named themes or profiles or
project templates.
Disadvantages:
- A newly added item may not be noticed by someone working with statically
defined subsets.
Combining Strategies
We have already mentioned that computer filesystems tend to combine a tree
structure and a search facility to help users locate files. Other combinations
can be very useful too. A search could return its results in a tree form helping
the user learn the categories used. The results of a search could be used to
from a named subset of items. Larger named subsets could be presented as a small
tree structure. And so on.
Summary
Working with a list? Find out how large it is or could become. If it could
grow to hundreds or thousands of entries consider each of the above strategies
(preferably with a user representative). Pick one or a combination and provide
your system with the ability to scale well to use by large organizations.