Information theory

The term "information theory" has, perhaps somewhat misleading, since the 1950s mainly been used about the statistical theory of communication developed by Claude E. Shannon  and others.

 

"It has repeatedly been said that Shannon himself didn't think that his theory was a theory about information. It seems that it is Myron Tribus, who, in an interview with Shannon in 1961 launched the history of the naming of the theory of information: "I had asked Shannon what his personal reaction had been when he had realized he had identified a measure of uncertainty. Shannon said that he hand been puzzled and wondering what to call his function. Information seemed to him to be a good candidate as a name, but it was already badly overworked. Shannon said he sought advice of John von Neumann, whose response was direct, "You should call it 'entropy' and for two reasons: First, the function is already in use in thermodynamics under that name; second, and more importantly, most people don't know what entropy really is, and if you use the word 'entropy' in an argument you will win every time!" (Ibid. p. 476). Against the widespread opinion that Shannon himself didn't consider his theory a theory of information stands the article in Encyclopedia Britannica, op. cit which he himself wrote: the title actually is "Information Theory", and Shannon several times uses this designation in the body of article. This opinion is supported by Anatol Rapoport who writes that ". . . the challenge of extending the concepts of information theory (. . .) is traceable to the writings of its founders." (Rapoport 1968 p. 137)" (Qvortrup, 1993, p. 22).

 

Shannon's theory should not be confused with other theories connected with the term information, just as theories of Library and Information Science (LIS, or just Information Science, IS) may or may not be related to Information theory in the traditional meaning related to Shannon. The validity of Shannon's theory should not be confused with the domain, in which it is relevant. As Buckland writes:

 

 "There is a valid and respectable field of formal information theory based on propositions, algorithms, uncertainty, truth statements, and the like, but its formal strengths are also its limits and make [it] inappropriate and inadequate for the concerns of LIS."  (Buckland, 2005, p.686).

 

Some authors, such as Wersig (2003), uses the term "information theory" about the theory of information science and regard the early period of this field dominated by Shannon's theory. In this entry (and other entries in this encyclopedia) the term information theory is not used in Wersig's sense but in the traditional meaning referring to Shannon. This understanding is the same as most encyclopedias have, for example, Wikipedia (2005):

 

 "This article is not to be confused with library and information science or information technology. Information theory is the mathematical theory of data communication and storage founded in 1948 by Claude E. Shannon. Modern information theory is concerned with error-correction, data compression, cryptography, communications systems, and related topics." Wikipedia (2005).

 

Information theory is interested in the transmission of signals from a sender in channels of communication to a receiver. A message is formed by combining a number of symbols from a register, for example, a given set of numbers and letters. The information content of a message is proportional with its length in the given code and is measured in  BITs. A channel has a given capacity, which is the amount of information that can be transferred in a given unit of time. Noise is a central concept in information theory and the goal is to establish a relation between signal and noise that is both technical efficient (i.e. economic affordable)  and permit transfer of information in which the interpretation is made certain.  Such a transfer of information reduces uncertainty (in regard to determine the sequence of symbols sent). This reduction of uncertainty is also termed information (in a certain narrow statistical meaning of the word information in relation to a random selection of symbols from the given set of symbols). Other important concepts in information theory are entropy (and negative entropy), noise, redundancy and feedback. Information theory is closely related to cybernetics and systems theory. It is also the theory that caused the rise of  the "cognitive science" and indirectly cognitive views in LIS (cf., Wersig, 2003).

 

Information theory gives rise to certain understandings of information such as the view that information is something modular that may be divided in discrete units and that information is something flowing in a system and that this "something" can be measured, processed and in varying degrees automated and controlled. Most of these assumptions have been much criticized and have been labeled The 'conduit metaphor' (Day, 2000). Those assumptions are in conflict with basic assumptions in semiotic approaches.

 

When the speech is about the "information explosion" it is, as criticized by Spang­Hanssen (2001) an unfortunate expression, because what is talked about is an explosion in published papers or data. It is not evident that users are becoming more informed because of the increase in the quantity of published papers. On the contrary, it might well be that they react with a kind of chock, and thus become less informed.

 

"The game [twenty questions] suggests that the information (as measured by Shannon's entropy statistic) required to identify an arbitrary object is about 20 bits. The game is often used as an example when teaching people about information theory. Mathematically, if each question is structured to eliminate half the objects, twenty questions will allow the questioner to distinguish between 220 or 1,048,576 objects. Accordingly, the most effective strategy for Twenty Questions is to ask questions that will split the field of remaining possibilities roughly in half each time. The process is analogous to a binary search algorithm in computer science" (Wikipedia, 2006).

 

 

 

 

Literature:

 

Blegvad, M.; Elberling, B. V.; Johnsen, E. & Rode, M. (Eds.). (1957). Videnskabens kommunikationsproblemer. Nordisk Sommeruniversitet 1956. København: Munksgaard.
 

Buckland, M. K. (2005). Review of "The philosophy of Information" (Herold, ed., 2004). Journal of Documentation, 61(5), 684-686.

 

Cawkell, A. E. (1990): The boundaries of information science: information theory is alive and well. Journal of Information Science. Principle & practice. 16(4), 215-216.

 

Cover, T. M. & Thomas, J. A. (2006). Elements of Information Theory. 2nd ed. John Wiley. (Wiley series in telecommunications).
 

Day, R. (2000). The 'Conduit Metaphor' and The Nature and Politics of Information Studies"  Journal of the American Society for Information Science, 51(9), 805-811.  

 

Jensen, J. F. (1990). Formattering af forskningsfeltet: Computer-Kultur & Computer-Semiotik. IN: Computer-Kultur, Computer-Medier, Computer-Semiotik. Red. af Jens F. Jensen. Ålborg: Nordisk Sommeruniversitet. (Pp. 10-50).

 

Kline, R. L. (2004). What Is Information Theory a Theory Of?  Boundary Work among Information Theorists and Information Scientists in the United States and Britain during the Cold War. IN: The History and Heritage of Scientific and Technical Information Systems: Proceedings of the 2002 Conference, Chemical Heritage Foundation, eds., W. Boyd Rayward and Mary Ellen Bowden. Medford, NJ: Information Today, 15-28. http://www.chemheritage.org/events/asist2002/01-kline.pdf

 

Miksa, F.  L. (1992). Library and information science: two paradigms.  IN: Conceptions of Library and Information Science. Historical, empirical and theoretical perspectives. Ed. by Pertti Vakkari & Blaise Cronin. London: Taylor Graham. (Pp. 229-252).

 

Qvortrup, L. (1993). The controversy over the concept of information. An overview and a selected and annotated bibliography. Cybernetics and Human Knowing, 1(4), 3-24.
 

Rapoport, A. (1956). The Promise and Pitfalls of Information Theory. Behavioral Science, 1(4), 303-309.
 

Schou-Christensen, J. (1984). Børsens edb-ordbog. København: Børsens forlag.
 

Shannon, C. E. & Weaver, W. (1949/1964). The Mathematical Theory of Communication. Urbane: University of Illinois Press, 1964. (Original: 1949).
 

Spang-Hanssen, H. (2001).: How to teach about information as related to documentation. Human IT, (1), 125-143. http://www.hb.se/bhs/ith/1-01/hsh.htm
 

Wersig, G. (2003). Information theory. In J. Feather, & P. Sturges (Eds.), International encyclopedia of library and information science (pp. 310­319). London & New York: Routledge.

 

Wikipedia, the free encyclopedia.(2005). Information Theory. http://en.wikipedia.org/wiki/Information_theory

 

Wikipedia. The free encyclopedia. (2006). Twenty Questions. http://en.wikipedia.org/wiki/Twenty_questions

 

 


See also: Information; Information science, theory; Information technology

 

 

 

 

 

Birger Hjørland

Last edited: 08-08-2006

Home

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

to be edited:


Edb-information måles som bekendt i bits:
Jensen, 1990, side 31: "Informationens mindsteenhed - informations­enheden - defineres her arbitrært som "en to-valgssituation" (:9), dvs. en basal binær opposition. Som eksempelvis: Ja/nej, tændt/­slukket, relæets on/off, åbent/lukket kredsløb etc. Udtrykkes det binære system i tal, gives der således kun to cifre: 0 og 1. Og da de binære tal 0/1 kan siges symbolsk at repræsentere alle typer af binære oppositioner, kaldes informationens mindsteenhed - efter J. W. Turkey's abbreviation for binary digits -for en bit. Shannon skriver: "En anordning med to faste positioner, f.eks. et relæ eller et flip-flop-kredsløb kan lagre en bit af information. N sådanne anordninger kan lagre N bits" (:32). Og det er netop dette princip af binære op­positioner, af 0'er og 1'er, af åbne og lukkede kredsløb, som er selve komputersprogets fundament, selve det teknisk-logiske princip bag hele computerte­knolo­gien. Weaver skriver: "de ideer, der er udviklet i dette arbejde (dvs. den matematiske kommunikationsteori, jfj.), forbinder sig... tæt til problemet omkring det logiske design af store computere" (:25).

Indenfor informationsteknologien og dens matematisk-naturvidenskabelige grundfag ligger ambitionen således i at installere informations­begrebet og -videnskaben indenfor et paradigme, der muliggør en praktisk-ingeniørmæssig bearbejdning, men som derved - i al fald i første omgang - kommer til at se bort fra det, der for en humanistisk-samfundsvidenskabelig og *BDI-faglig betragtning er det centrale: indhold, betydning eller mening.

Weaver skriver i sin præsentationsartikel: "Ordet information i denne teori bliver brugt i en speciel betydning, som ikke må forveksles med den normale brug. Særlig må information ikke forveksles med indholdsmæssig mening".

En bit er den informationsmængde, der svarer til at man får oplyst rigtigheden af to muligheder, hvilket der som bekendt er 50 % chance for at gætte rigtigt. Hvis man kun har 25% chance for at gætte rigtigt, og får oplyst svaret, får man tilført mere information: 2 bits o.s.v. Man kan sige, at kernen i dette informationsbegreb ligger i en reduktion af modtagerens statistiske usikkerhed, når han står overfor et antal definerede alternativer.

Det er karakteristisk, at man ikke her interesserer sig for informationens nytte (det pragmati­ske synspunkt) eller dens mening for brugeren (det semantiske synspunkt), men kun for det syntaktiske aspekt, d.v.s. man sammenfatter (gennem en såkaldt definition via abstraktion) en klasse bestående af alle ekvivalente signaler, der kan overføres elektronisk/mekanisk ved statistisk behandling. Således er informationsmængden i Shannons forstand proportional med længden af et budskab (i en given kode).

I edb skelner man videre mellem *data og information. Hvor data er "en formaliseret repræsentation af kendsgerninger, begreber eller instruktioner i en form, der er egnet for overførsel, tolkning eller bearbejdning ved mennesker eller automatiske hjælpemidler", så er information "det betydningsindhold, en person tillægger en mængde af data. Data har således ikke noget entydigt informationsindhold, idet tolkningen afhænger af den viden og de forud­sætninger, der er tilstede hos den/de personer, der anvender de foreliggende data. Begreberne data og information benyttes af de fleste kritikløst som synonymer..." (Schou-Christensen, 1984). Data har således klart en mere objektiv, information en mere modtagerdefineret betydning, og denne forskel er gennemgående i alle definitioner, der er inspireret af eller påvirket fra informationsteorien.

In the beginning was Shannon's information theory received with great enthusiasm in many contexts. This enthusiasm was not only visible in communitation and information technology, in which domains it has it main strength, but also in psychology, in the social sciences and in Library and Information Science (LIS), in relation to problems associated with, for example,  libraries and literature searching. In 1956 was thus made a Scandinavian conference ("Nordisk Sommeruniversitet", NSU) with the title (translated) "Problems in Scientific Communication", (Blegvad, 1957, p. 13), in which was expressed the opinion, that thanks to Shannon's information theory we have got a theoretical foundation for what is today termed LIS: 

 

"Når NSU besluttede sig til at tage hele dette problemkompleks op, var det ikke blot, fordi det var højaktuelt, ej heller fordi det ikke før havde været behandlet i sin helhed, skønt disse to forhold naturligvis spillede en væsentlig rolle. Men det afgørende var, at der nu i modsætning til tidligere syntes at være håb om at få problemerne formuleret i et hensigtsmæs­sigt begrebssystem. Det synspunkt, at disse tilsyneladende ret forskelligartede problemer alle er kommunikationsproblemer hænger sammen med udviklingen af en almen teori om kommunikation. Det er for meget sagt, at en sådan teori foreligger idag; snarere gælder, at man med meget forskellige udgangs­punkter indenfor forskellige fag og discipliner har udviklet begreber og teorier, der ser ud til at kunne samarbejdes til en almen kommunikations­teori. Det drejer sig for det første om den såkaldte informationsteori." (Blegvad, 1957, p. 13).
 

Man må nok indrømme, at denne informationsteori har været skuffelse i såvel samfundsvidenskab som informationsvidenskab, selvom man ikke skal undervurdere den gødning af grobunden, som den har været stærkt medvirkende til, ligesom man heller ikke skal forklejne betydningen af nogle grund­læggende begreber og synspunkter, der er så veletablerede, at de er blevet relativt uproble­matiserede baggrundsantagelser også i informationsvidenskaben. Også f.eks. biblioteks- og dokumen­tations­forskningens navneskift til "informationsvidenskab" og dens interesse for begrebet *information, skyldes - på godt og ondt - hovedsageligt informationsteorien.

Skuffelsen skyldes dels teoriens nævnte begrænsning til det syntaktiske aspekt, dels van­skeligheder ved overhovedet indenfor disse områ­der at kunne tilvejebringe situationer, hvor forudsætningerne for anvende teorien kan opfyldes: At definere situationer, hvor mod­tageren af data kan siges at være i en vel­defineret grad af statistisk usikkerhed, således at det giver mening at måle den overførte information.

Rapaport (1956, side 17) anfører bl.a. følgende eksempel på informationsteoriens vanskeligheder: (vor oversættelse) "Forudsæt eksempelvis, at en person ikke ved, hvorvidt der eksisterer spøgelser. Så læser han en bog, der præsenterer vægtige argumenter imod ek­sistensen af spøgelser. Som resultat af læsningen bliver han overbe­vist om, at spøgelser ikke eksiste­rer. Hvor meget information har denne person modtaget? Man kan "argumentere" på følgende måde: Før læsning af bogen var vor person lige usikker på, om svaret på spørgsmålet om spøgel­sers eksistens var "ja" eller "nej". Efter at han har læst bogen, er han sikker på, at svaret er "nej". Han har derfor modtaget een bit information. Vi vil hævde, at svaret er util­freds­stillende. (Vi vil afstå fra at rejse det samme spørgsmål i tilfælde af at personen ved læsning af en bog bliver overbevist om, at spøgelser eksisterer)."

Spang-Hanssen (op. cit. p. 10) nævner som et andet problem eller eksempel, at der ved videnskabelig litteratur helt åbenbart ikke er tale om at en artikels længde er proportional med dens informationsindhold, og at der endog kan være tale om at et abstracts er lige så informativt som hele artiklen.

Selvom der således knytter sig alvorlige problemer til Shannons m.fl.s informationsteori som en omfattende teori om videnskabelig kommunikation, som teoretisk referenceramme, så rummer teorien dog frugtbare (men afgrænsede) muligheder også på dette område. Spang-Hanssen (1970) peger f.eks. på at tesauri kan studeres udfra bl.a. informationsteoretiske referencerammer.

Det hovedresultat, vi må sige informationsteorien har beriget os med, er den modtagerafhængige bestemmelse af informationsbegrebet.

 



 

Transmission models - criticism:  http://www.cultsock.ndirect.co.uk/MUHome/cshtml/index.html