Saturday, April 29, 2006

SEMINAR ON CRANFIELD TESTS



CRANFIELD TESTS

Introduction:

Evaluation experiments are those which are conducted to test the performance of the information retrieval systems. Quite a number of retrieval studies have been conducted during recent decades. Lancaster mentions that probably the first evaluation study in information retrieval was conducted in 1953. The first significant studies were the Cranfield tests, which brought a new dimension to research in retrieval system evaluation.

CRANFIELD 1:

The first evaluation of IRS was undertaken at Cranfield, U.K., under the guidance of C.W. Cleverdon in 1957 and is known as Cranfield test 1.

Objectives:

The project was designed to compare the effectiveness of four indexing systems:
1. an alphabetical subject catalog based on a subject heading list.
2. a UDC classifed catalog with alphabetical chain index.
3. a catalog based on a faceted classification and an alphabetical index.
4. a catalog with uniterm coordinate index.

System parameters:

The study involved 100 documents half of which were research reports and half periodical articles, were chosen equally from the general field of aeronautics and specialised field of aerodynamics.
Three indexers were chosen - one with subject knowledge, one with indexing experience, and one straight from library school having neither of both. Each indexer was asked to index each source five times sprnding 2, 4, 6, 12, and 16 minutes per document. One hundred documents thus generated 6000 indexed items (100 docs * 3 indexers * 4 systems * 5 times ). Each of these 6000 items was tested in 3 phases and therefore the system worked on altogether 18000 indexed items. The test was carried to find out whether the level of performance increased with the increasing experience of the system personnel.
The project used manufactured queries formulated by the members out of the project before the actual search. Each document was studied by them and 400 relevant queries were generated.

Results:

1 All four systems operated with an overall average recall ratio of 80%.
NOTE: Recall ratio = (Hits/Number of documents on the system) * 100 %

2 The recall ratios of different systems were as follows:
uniterm: 82%
alphabetical index: 81.5%
UDC: 76%
faceted classification: 74%

3 Increased time in indexing increased the recall.

4 No significant difference was found in the performance of the threee different indexers.

Failures:

Question faiures 17%
Indexing failures 60%
Searching failures 17%
System failures 6%

Significance:

The test proved that the performance of a system does not depend on the indexing experience or the subject background of the indexer. And it developed for the first time the methodologies that could successfilly be applied to evaluate the IRS.

CRANFIELD TEST 2

The second Cranfield test was a controlled experiment that attempted to assess the effects of the index languages on the retrieval systems. This study was performed by varying each factor, while keeping the others constant. The various index languages were formed by dividing the concepts into phrases, showing hierarchy with superordinate, subordinate, and collateral classes, adding broader and narrower terms to the original terms, combining synonyms and quasi synonyms with the original indexing terms etc.

Results:

1 in the case where concepts were used for indexing, the system performance worsened with the introduction of superordinate, subordinate and collateral classes along with the original concepts.

2 in the case of single terms, the inclusion of the quasi synonyms worsened the performance.

3 when broader and narrower terms were included along with the controlled languages of the thesaurus, the performance worsened.

4 index languages formed out of titles performed better than those formed out of abstracts.

Conclusion:

However, the Cranfiend tests have their own significance in the evaluation retrieval studies. The major determinant factors of performance viz., recall, precision, fallout, inverse relationship between recall and precision, etc were determined through these tests.