Unified search theory

By: John Kaster

Abstract: Use our new search engine to find everything on the CodeGear websites

    The unified Unicode search engine

On Wednesday, September 17, 2008, we launched a new search engine based on an open source search technology called Lucene.

Our new search engine supports searching for results on all of our major web sites. Because it is based on Lucene, there are now advanced free text search options available.

You can see the new search engine right now by typing some text into the little search box on this browser page and clicking the Search link.

    New search features

There are major new improvements in our search functionality:

Search across all major web properties (CodeCentral, QualityCentral, Blogs, all GetPublished sites)

Faster search (although we always have more performance tweaks to make).

Search in any supported human language.

Phrase search, like "this is my phrase".

Full indexing of all words and numbers (number indexing was turned off in our old search engine).

Search all source code in CodeCentral and appropriately marked documents. (QualityCentral attachments will be also indexed soon.)

Consolidated display of content mapped to multiple locations on our web sites.

SearchInsight™ provides tips based on the content of our repository for strings that match the text you are typing.

    SearchInsight™

(For those who enjoy puns like I do, feel free to call this SearchInSite instead!)

As you type text in the small search box at the top of one of our web pages, an AJAX call is made to our search hints web service, matching the first "word" of what you type against the keywords in our search index. The following screenshot shows SearchInsight™ for datasnap:

Hide image

Figure 1Search Insight for "datasnap"

Our search is case-insensitive, do you can type in DataSnap, datasnap or any other character casing combination, and still match the search results.

The Approx hits value reports the approximate number of unique entries matching what you have currently typed. The actual results you get back can vary based on the visibility rules of the content matched, and your access rights.

The search hints retrieval logic is conceptually simple (but somewhat complex to implement). It returns the first 3 keyword combinations that match the pattern of the "first word" you type.

For example, with datasnap as the value, we request the first three (3) matches for the following patterns, providing 9 potential "quick search" values:

Search Pattern

Search Index Matches

datasnap*

+datasnap

+datasnap.application

+datasnap.jkaster

datasnap* *

+datasnap +technology

+datasnap +licensing

+datasnap +can

datasnap* * *

+datasnap +server +using

+datasnap +2009 +overview

+datasnap +area +has

After you begin typing in the next "word" of your search criteria, we provide only the estimate of matching items in the index, as this screenshot shows:

Hide image

    Advanced Search Options

If the standard search doesn't provide enough control over the results, you can use the Advanced search dialog to fine-tune your search criteria.

Hide image
Click to see full-sized image

You can search for a specific author by name, search only content titles, abstracts (short descriptions, summaries), and the body (full description) of the indexed content if you want to limit your keyword searching to a specific part of the content.

    Source code searching

When searching source code, you can choose to search for source code in any of our supported languages or all of them, and also specify the sections of the source code you want to search.

The following options are available for searching source code:

  • Source code: search all source code, comments, and string constants
  • Code only: search only source code and string constants, omitting comments
  • Comments: search only the text of source code comments
  • Strings: search only the text of string constants in the source code

Source code search matches found in CodeCentral are displayed with the name of the source file as the title. Clicking on the link will go directly to the submission. For example, searching for C++ code that calls TClientDataSet currently returns 59 matches. Each of these matches shows the file containing the TClientDataSet reference via the CodeCentral archive explore, which uses YAPP to syntax highlight the source code. You can click on the link above to try it for yourself.

When a source code match is found in an article, the link provided gets you directly to the article. We hope to produce a handy Javascript call in the near future that will automatically highlight the search matches in the source files and documents, but we decided that could come after we got the initial search engine replacement launched.

The "code only", "comments", and "strings" options are only available if you specify a source code language for your search.

    Restricting searches by spoken language

If you filter by language, only content in the selected language will be shown. If you select "My Preferred Language", content will be filtered using the following rules:

  • Articles are automatically filtered by your preferred language on CDN. Search results include articles in your preferred language, or, if no translation in that language exists, in English.
  • Content on sites that do not support translations, like CodeCentral and QualityCentral, is not filtered by language.

    Restricting searches to a community

The "community" drop down at the bottom of the screenshot will only contain communities if you're searching the current site, a single selected site, or all sites when only a single site exists. Otherwise, it will always be set to "Any Community".

    Displaying links for content mapped to multiple locations

We sometimes have content in GetPublished that is mapped to multiple locations.

In the previous search interface, users were often confused by the same content being displayed in separate links. Now only one list item per mapped site is shown for the link, as this screenshot shows:

Hide image

Item number 2 in the list above is a "welcome" article, which is landing page content for an area of a site. It is mapped to the Delphi, C++, Java, InterBase, BlackfishSQL, and PHP communities, so each one is displayed in a separate link below the title. Only welcome articles have multiple links below the title, because they appear in multiple URLs on the site.

We have some content that could be mapped to appear on our support, www, and developer network sites. Only one list item per site will be displayed with our new search results list.

    Subscribing to a search feed

One very handy thing I'd like to mention is something that was available in our previous search engine as well, but I think most people didn't notice it. One the search results page, we have RSS and Atom icons that allow you to subscribe to the search criteria you entered. This way, you can have completely customized automatic notifications of content updates on all our web sites.

    Next steps

There are many other features we want to add to the search engine, like saving user search preferences, adding result highlighting in matching content links, and so on. Please use the search area in QualityCentral to report bugs, and request any features you'd like to see for the new search.

I really hope you find what you're looking for!

John Kaster

Internet Services Architect

Embarcadero

Server Response from: ETNASC01