Partial text representation & identification (With 'passage retrieval' and 'zoning')
An information system, which does not contain full-text, but only a fraction of the text in a document. Partial text representations are derived from the represented documents (as opposed to assigned indexing).
Databases, which contain abstracts and/or references (i.e., bibliographical databases) are formally a form of partial text databases, but are usually not recognized as such. There exist all kinds of partial text representations from 0% to 100% text representation.
Newspaper clippings are kinds
of partial text representations (see Retti & Stehno, 2004). Another kind of partial text-representation is
SAP-Indexing.
Informationsordbogen (1991, p. 54) mentions "klip-referat" [without English
equivalent, a form of cuttings] developed by the Danish documentalist J. Thurah
Nielsen. This is a kind of partial text representation based on
cutting from original materials or copies title, author, introduction, synopsis,
conclusion and also important sections as well as other pieces.
Partial text representation systems are, of course, today challenged by systems offering full-text representations. A document is designed to be used as a unit, why less than full-text-representation generally is seen as a disadvantage by users. Users may need enriched documents (metadata and meta-documents), but not reduced documents.
In full-text representation it may be fruitful to identify "passages" or "zones". Passage retrieval was defined by Salton; Allan & Buckley (1993) as "retrieval strategies designed to retrieve text excerpts of varying size in response to statements of user interest". (See further Kaszkiel; Zobel & SacksDavis (1999), Melucci (1998), Shepherd (1981, 1983) and O'Connor (1978, 1980)).
Kowalski & Maybury (2000) suggest in
their work on automatic indexing with the concept "zoning" as the identification
of sections of documents to be used for indexing (cf., Luther & Bøtker Schmidt,
2006).
Literature:
Informationsordbogen. (1991). Ordbog for informationshåndtering, bog og bibliotek. 2. udg. Udarbejdet af J. B. Friis-Hansen, Torben Høst, Poul Steen Larsen & Henning Spang-Hanssen. [Hellerup]: Dansk Stardiseringsråd. (DS/INF 27).
Kaszkiel, M.; Zobel, J. & SacksDavis, R. (1999). Efficient passage ranking for document databases. ACM Transactions on Information Systems, 17(4), 406-439.
Kowalski, G. J. & Maybury, M. T. (2000). Information Storage and Retrieval Systems: Theory and Implementation. Boston, MA: Kluwer.
Luther, A. & Bøtker Schmidt, M. (2006). Zoning. Et alternativ til fuldtekstindeksering? Dansk Biblioteksforskning, 2(1), 41-51. http://www2.db.dk/dbf/2006/nr1/luther.pdf
Melucci, M. (1998). Passage retrieval: A probabilistic technique. Information Processing & Management, 34(1), 43-68.
O'Connor, J. (1978). Passage retrieval for cancer questions. Proceedings of the American Society for Information Science, 15, 256-259.
O'Connor, J. (1980). Answer: Passage retrieval by text searching. Journal of the
American Society for Information Science, 31(4), 227-239.
Poulsen, C. (1987). Begrundelse for at anvende deltekstrepræsentation af
metalitteratur til emnesøgning. København: Danmarks pædagogiske Bibliotek.
(Skriftserie fra Danmarks pædagogiske Bibliotek nr. 6).
Retti, G. & Stehno, B. (2004). The Laurin thesaurus - A
large, multilingual, electronic thesaurus for newspaper clipping archives.
Journal of Documentation, 60(3), 289-301.
Salton, G.; Allan, J. & Buckley, C. (1993). Approaches to Passage Retrieval in Full Text Information Systems. Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM Press .
Shepherd, M. A. (1981). Text passage retrieval based on
Colon Classification: Retrieval performance. Journal of Documentation, 37(1),
25-35.
Shepherd, M. A. (1983). Text passage retrieval based on Colon Classification:
Failure analysis. Canadian Journal of Information Science—Revue Canadienne des
Sciences de l'Information, 8(June), 75-82.
See also: Full text searching;
SAP-Indexing
Birger Hjørland
Last edited: 01-06-2006
Lifeboat for Knowledge Organization HOME