Noor Husna :: Information Sources and Services: October 2009

:: Federated Search ::

Definition

Federated searching is a simultaneous search of multiple online databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It works as a tool to help users identify the databases that are best suited to the subjects they are researching. It allows users to search across multiple resources: subscription databases, library catalogs, and web sites. It has other names as well - metasearch, parallel search, broadcast searching.

There is a large amount of content that is not available to crawl-type search engines like Google. Federated search engines, in particular ones that perform deep web searches, are required to access this additional content like scientific, technical, and business databases.

Federated searching consists of:
(1) Transforming a query and broadcasting it to a group of disparate databases or other web resources, with the appropriate syntax
(2) Merging the results collected from the databases
(3) Presenting them in a succinct and unified format with minimal duplication
(4) Providing a means, performed either automatically or by the portal user, to sort the merged result set.

Technologies

Federated search software uses standardized protocols to access databases. The most common protocol used is Z39.50. Some target databases that do not comply with the Z39.50 standard can still be searched using "translator" programs that convert the query format of the federated system into the format of the native system. However, many information resources do not make their query protocols public, and thus they cannot be searched using a federated search engine. The search results that are retrieved from various targets may be deduplicated to reduce extraneous results. Some systems also rank results by relevancy or permit some other type of sorting.

User authentication is another necessary technology for federated search systems. This stems from the use of licensing agreements that libraries sign with vendors. These agreements typically limit access to certain groups or numbers of users affiliated with an institution or consortia.

How federated search works

Federated search engines use software "connectors" to access information sources. The federated search engine takes the user's search query, transforms the search terms to match each content source's requirements, and submits the query to each of the sources simultaneously. When the search results come back from each of the sources, the federated search engine merges them together, modifying the look and feel of each of the result pages to have a single look and feel.

A connector is a piece of software that is written to access a content source. A connector must know the URL of the source, how to send search commands, what the search syntax is, and how to process the search results that are returned from a source. Connectors can be challenging to write if access to a source requires handling multiple steps, URL redirection, cookies, sessions, or authentication methods.

A web patron seeking science information comes to a gateway site, like Science.gov, and enters a query. The query is transmitted to the gateway server and then it is fanned out to a suite of databases across the entire world. At each database, the query launches a search and brings back a hit list. The list is then transmitted back to the gateway server, where the hits are relevancy ranked and presented to the web patron. So, in the span of about 20 seconds, the query is transmitted to numerous databases, searches are executed at numerous databases, and the results are brought back and ranked for the patron.

Federated search drills down to the deep web where scientific databases reside. Unlike the popular search engines, federated search places no burden on the database owners.

Organizations

Science Accelerator is a gateway to science, including R&D results, project descriptions, accomplishments, and more, via resources made available by the Office of Scientific and Technical Information (OSTI), U.S. Department of Energy. Science Accelerator was developed and is made available by OSTI as a free public service.

Science.gov is a gateway to government science information and research results. Currently in its fifth generation, Science.gov provides a search of over 40 scientific databases and 200 million pages of science information with just one query, and is a gateway to 1,950+ scientific websites.

WorldWideScience.org is a global science gateway connecting users to national and international scientific databases and portals. WorldWideScience.org accelerates scientific discovery and progress by providing one-stop searching of global science sources. The WorldWideScience Alliance, a multilateral partnership, consists of participating member countries and provides the governance structure for WorldWideScience.org.

Deep Web Technologies host a federated search application in their data center and deploy it in hosting environment. Additionally, they can maintain customer’s application, including monitoring and updating of connectors as needed. They also provide a needs assessment, consulting services, deployment and maintenance training, custom software development and look-and-feel design service. Deep Web Technologies' flagship product is its Explorit Research Accelerator federated search application. The product can be customized for specialized customer needs, both in terms of look-and-feel and to add functionality. Deep Web Technologies develops connectors for a wide range of content databases and will create custom connectors to meet customer’s needs.

Strengths

1. One search interface for multiple resources from different database providers eliminates the need for the user to learn how to use the different search interfaces of all the individual databases.
2. Increasing the size of the collection searched may help improve the number of articles retrieved.
3. The searcher may be exposed to relevant content from resources that s/he may not have been familiar with.

Limitations

1. The lack of a uniform authentication standard means that some databases are inaccessible to federated search engines.
2. True, full, deduplication is impossible because databases download results in small sets and metadata standards vary by resource.
3. Relevancy ranking is limited by the quality of the metadata, which usually does not include abstracts or full-text information.
4. Although federated search systems are fundamentally software, they must be implemented and managed as a service, which takes a great deal of resources.
5. Federated search engines cannot improve on the native interface in terms of search accuracy and precision.

Sources

1. http://www.altsearchengines.com/2009/01/11/federated-search-finds-content-that-google-cant-reach-part-i-of-iii/
2. http://www.libraryjournal.com/article/CA6571320.html
3. http://www.infotoday.com/IT/oct03/hane1.shtml
4. http://lu.com/odlis/odlis_f.cfm
5. http://www.ala.org/ala/mgrps/divs/alcts/resources/org/cat/research/fed_search.cfm
6. http://www.ereleases.com/pr/deep-web-technologies-developing-multilingual-translator-federated-search-25166
7. http://www.osti.gov/fedsearch#federated
8. http://guides.mysapl.org/esources
9. http://en.wikipedia.org/wiki/Federated_search

Noor Husna :: Information Sources and Services

Monday, October 19, 2009

Activity 4 : Question 2

About Me

Blog Archive

Followers