Relevance feedback (RF)
"A form of query-free retrieval where documents are retrieved according to a measure of equivalence to a given document. In essence, a user indicates to the retrieval system that it should retrieve "more documents like this one."" (Glossary of Sensemaking Terms, 1999).
"RF has proved to be a useful and pragmatic solution to the
uncertainty of describing an information
need. It has further, in test collection evaluations, been shown to be a
relatively stable procedure: it
works in most cases, a wide range of algorithms give approximately the same
performance and how the
algorithmic parameters should be set are fairly well understood. Although we
have not discussed nontext documents, such as images or speech, in this paper
the same basic principle of selecting good
discriminators of relevance can be used for different media to implement RF
functionality." (Ruthven & Lalmas, to appear,
p. 45).
Sparck Jones concludes her answer to Hjørland & Nissen Pedersen (2005) in the following way:
"At the same time, one of the most important techniques developed in retrieval research and very prominent in recent work, namely relevance feedback, raises a more fundamental question. This is whether classification in the conventional, explicit sense, is really needed for retrieval in many, or most, cases, or whether classification in the general (i.e. default) retrieval context has a quite other interpretation. Relevance feedback simply exploits term distribution information along with relevance judgments on viewed documents in order to modify queries. In doing this it is forming and using an implicit term classification for a particular user situation. As classification the process is indirect and minimal. It indeed depends on what properties are chosen as the basic data features, e.g. simple terms and, through weighting, on the values they can take; but beyond that it assumes very little from the point of view of classification. It is possible to argue that for at least the core retrieval requirement, giving a user more of what they like, it is fine. Yet it is certainly not a big deal as classification per se: in fact most of the mileage comes from weighting. And how large that mileage can be is what retrieval research in the many experiments done in the last decade have demonstrated, and web engines have taken on board." (Sparck Jones, 2005, p. 601).
Let us consider Sparck Jones suggestion that ". . . relevance feedback, raises a more fundamental question. This is whether classification in the conventional, explicit sense, is really needed for retrieval . . ." Suppose, for example, a person is searching information about "Sweden". Some references are retrieved by using search terms (or otherwise). The user indicates which references are relevant and the system is supposed to find "more like this". In a traditional classification may all Swedish place names be classified (e.g., Borås, Lund, Malmö, Stockholm . . .). Can such a classification be replaced by mechanisms providing relevance feedback? One problem might be that the user does not know which place names are Swedish and which are not Swedish. He may provide incorrect feedback (e.g. by stating that a reference about "Bagsværd" is relevant). A possibility may therefore be that users are not able to retrieve the relevant documents and to avoid the non-relevant documents by systems based on relevance-feedback. In other words: Classification in the traditional sense is still needed.
The idea of relevance feedback is based on the assumption that users can make correct classifications of retrieved references (and thus of the subject that those references are about). This assumption is in conflict with a realist theory of knowledge according to which there exist a reality, which need not be known by a given person or group.
Another problem is how it is determined in a given algorithm which kind of similarity between documents and thus which kinds of semantic relations is is based on? If a user indicates that a given record is "relevant", the algorithm may identify "similar" documents based on, for example, similar words in titles, similar subject categories, co-citation measures etc. The user is not informed about the basis of such decisions and have no possibility to adjust them for a given query.
Two basic criticism of relevance feedback may be summarized:
1) Relevance feedback is based on certain premises about users' knowledge that are largely unexplored and may turn out to be highly unrealistic.
2). Relevance feedback represents unspecified and unclear semantic relations between documents considered relevant. Why prefer a kind of system implying unspecified relations rather than specified and user-controlled relations?
Literature:
Glossary of Sensemaking Terms. (1999).
http://www2.parc.com/istl/groups/hdi/sensemaking/glossary.htm
Hjørland, B. & Nissen Pedersen, K. (2005). A substantive theory of classification for information retrieval. Journal of Documentation, 61(5), 582-597. Click for full-text pdf.
Ruthven, I. & Lalmas, M. (to appear). A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, Available at: http://www.dcs.qmul.ac.uk/~mounia/CV/Papers/ker_ruthven_lalmas.pdf
Salton, G. & Buckley, C. (1990). Improving retrieval performance by relevance
feedback. Journal of the American Society for Information Science, 41(4),
288-297.
Sparck Jones, K. (2005). Revisiting classification for retrieval. Journal of Documentation, 61(5), 598-601. [Reply to Hjørland & Nissen Pedersen, 2005]. http://www.db.dk/bh/Core%20Concepts%20in%20LIS/Sparck%20Jones_reply%20to%20Hjorland%20&%20Nissen.pdf
See also: Feedback; Query Expansion; Relevance (Epistemological lifeboat)
Birger Hjørland
Last edited: 25-02-2007