Partial text representation & identification (With 'passage retrieval' and 'zoning')

An information system, which does not contain full-text, but only a fraction of the text in a document. Partial text representations are derived from the represented documents (as opposed to assigned indexing).

 

Databases, which contain abstracts and/or references (i.e., bibliographical databases) are formally a form of partial text databases, but are usually not recognized as such. There exist all kinds of partial text representations from 0% to 100% text representation. 

 

Newspaper clippings are kinds of partial text representations (see Retti & Stehno, 2004). Another kind of partial text-representation is SAP-Indexing.

Informationsordbogen (1991, p. 54) mentions "klip-referat" [without English equivalent, a form of cuttings] developed by the Danish documentalist J. Thurah Nielsen.  This is a kind of partial text representation based on cutting from original materials or copies title, author, introduction, synopsis, conclusion and also important sections as well as other pieces.

 

Partial text representation systems are, of course, today challenged by systems offering full-text representations. A document is designed to be used as a unit, why less than full-text-representation generally is seen as a disadvantage by users. Users may need enriched documents (metadata and meta-documents), but not reduced documents.

 

In full-text representation it may be fruitful to identify "passages" or "zones". Passage retrieval was defined by Salton; Allan & Buckley (1993) as "retrieval strategies designed to retrieve text excerpts of varying size in response to statements of user interest". (See further Kaszkiel; Zobel & SacksDavis (1999), Melucci (1998), Shepherd (1981, 1983) and O'Connor (1978, 1980)).

 

Kowalski & Maybury (2000) suggest in their work on automatic indexing with the concept "zoning" as the identification of sections of documents to be used for indexing (cf., Luther & Bøtker Schmidt, 2006).
 


Literature:

 

Informationsordbogen. (1991). Ordbog for informationshåndtering, bog og bibliotek. 2. udg. Udarbejdet af J. B. Friis-Hansen, Torben Høst, Poul Steen Larsen & Henning Spang-Hanssen. [Hellerup]: Dansk Stardiseringsråd. (DS/INF 27).

 

Kaszkiel, M.; Zobel, J. & SacksDavis, R. (1999). Efficient passage ranking for document databases. ACM Transactions on Information Systems, 17(4), 406-439.

 

Kowalski, G. J. & Maybury, M. T. (2000). Information Storage and Retrieval Systems: Theory and Implementation. Boston, MA: Kluwer.

 

Luther, A. & Bøtker Schmidt, M. (2006). Zoning. Et alternativ til fuldtekstindeksering? Dansk Biblioteksforskning, 2(1), 41-51. http://www2.db.dk/dbf/2006/nr1/luther.pdf

 

Melucci, M. (1998). Passage retrieval: A probabilistic technique. Information Processing & Management, 34(1), 43-68.

 

O'Connor, J. (1978). Passage retrieval for cancer questions. Proceedings of the American Society for Information Science, 15, 256-259.


O'Connor, J. (1980). Answer: Passage retrieval by text searching. Journal of the American Society for Information Science, 31(4), 227-239.


Poulsen, C. (1987). Begrundelse for at anvende deltekstrepræsentation af metalitteratur til emnesøgning. København: Danmarks pædagogiske Bibliotek. (Skriftserie fra Danmarks pædagogiske Bibliotek nr. 6).

 

Retti, G. & Stehno, B. (2004). The Laurin thesaurus - A large, multilingual, electronic thesaurus for newspaper clipping archives. Journal of Documentation, 60(3), 289-301.
 

Salton, G.; Allan, J. & Buckley, C. (1993). Approaches to Passage Retrieval in Full Text Information Systems. Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM Press .

 

Shepherd, M. A. (1981). Text passage retrieval based on Colon Classification: Retrieval performance. Journal of Documentation, 37(1), 25-35.

Shepherd, M. A. (1983). Text passage retrieval based on Colon Classification: Failure analysis. Canadian Journal of Information Science—Revue Canadienne des Sciences de l'Information, 8(June), 75-82.

 


See also: Full text searching; SAP-Indexing

 

 

Birger Hjørland

Last edited: 01-06-2006

Core Concepts in LIS Home

Lifeboat for Knowledge Organization HOME