Information theory
The term
"information theory" has, perhaps somewhat misleading, since the 1950s mainly
been used about the statistical theory of communication developed by
Claude E. Shannon and others.
"It has repeatedly been said that Shannon himself
didn't think that his theory was a theory about information. It seems that it is
Myron Tribus, who, in an interview with Shannon in 1961 launched the history of
the naming of the theory of information: "I had asked Shannon what his personal
reaction had been when he had realized he had identified a measure of
uncertainty. Shannon said that he hand been puzzled and wondering what to call
his function. Information seemed to him to be a good candidate as a name,
but it was already badly overworked. Shannon said he sought advice of John von
Neumann, whose response was direct, "You should call it 'entropy' and for two
reasons: First, the function is already in use in thermodynamics under that
name; second, and more importantly, most people don't know what entropy really
is, and if you use the word 'entropy' in an argument you will win every time!"
(Ibid. p. 476). Against the widespread opinion that Shannon himself didn't
consider his theory a theory of information stands the article in Encyclopedia
Britannica, op. cit which he himself wrote: the title actually is
"Information Theory", and Shannon several times uses this designation in the
body of article. This opinion is supported by Anatol Rapoport who writes that ".
. . the challenge of extending the concepts of information theory (. . .) is
traceable to the writings of its founders." (Rapoport 1968 p. 137)" (Qvortrup,
1993, p. 22).
Shannon's theory should not be confused with other theories connected with the
term
information, just as theories of
Library and Information Science
(LIS, or just Information Science, IS) may or may
not be related to Information theory in the traditional meaning related to
Shannon. The validity of Shannon's theory should not be confused with the
domain, in which it is relevant. As Buckland writes:
"There is a valid and respectable field of formal information theory based on
propositions, algorithms, uncertainty, truth statements, and the like, but its
formal strengths are also its limits and make [it] inappropriate and inadequate for
the concerns of LIS." (Buckland, 2005, p.686).
Some authors,
such as
Wersig
(2003), uses the term "information theory" about the theory of information
science and regard the early period of this field dominated by Shannon's theory.
In this entry (and other entries in this encyclopedia) the term information
theory is not used in Wersig's sense but in the traditional meaning referring to Shannon. This understanding
is the same as most encyclopedias have, for example,
Wikipedia (2005):
"This article is not to be
confused with
library and information science or information technology. Information theory is the mathematical theory of data
communication and storage founded in 1948 by Claude E. Shannon.
Modern information theory is concerned with error-correction,
data compression, cryptography, communications systems, and
related topics." Wikipedia (2005).
Information theory is interested in the transmission of
signals from a sender in channels of communication to a
receiver. A message is formed by
combining a number of symbols from a register,
for example, a given set of numbers and letters. The information
content of a message is proportional with its length in the
given code and is measured in
BITs. A channel has a
given capacity, which is the amount of information that can be
transferred in a given unit of time. Noise is a central concept
in information theory and the goal is to establish a relation
between signal and noise that is both technical efficient (i.e.
economic affordable) and permit transfer of information in
which the interpretation is made certain. Such a transfer
of information reduces uncertainty (in regard to determine the
sequence of symbols sent). This reduction of uncertainty is also
termed information (in a certain
narrow statistical meaning of the word information in relation
to a random selection of symbols from the given set of symbols).
Other important concepts in information theory are
entropy (and negative entropy),
noise, redundancy and
feedback. Information theory is
closely related to cybernetics and
systems theory. It is also the
theory that caused the rise of the "cognitive
science" and indirectly
cognitive views in LIS (cf., Wersig, 2003).
Information
theory gives rise to certain understandings of information such as the view that
information is something modular that may be
divided in discrete units and that information is something flowing in a system
and that this "something" can be measured, processed and in varying degrees
automated and controlled. Most of these assumptions
have been much criticized and have been labeled The 'conduit
metaphor' (Day,
2000). Those assumptions are in conflict
with basic assumptions in semiotic approaches.
When the speech is about the
"information explosion" it is, as
criticized by SpangHanssen (2001) an unfortunate expression, because what is
talked about is an explosion in published papers or data. It is not evident that
users are becoming more informed because of the increase in the quantity of
published papers. On the contrary, it might well be that they react with a kind
of chock, and thus become less informed.
"The game [twenty questions] suggests that the
information (as measured by Shannon's entropy statistic) required to identify an
arbitrary object is about 20 bits. The game is often used as an example when
teaching people about information theory. Mathematically, if each question is
structured to eliminate half the objects, twenty questions will allow the
questioner to distinguish between 220 or 1,048,576 objects.
Accordingly, the most effective strategy for Twenty Questions is to ask
questions that will split the field of remaining possibilities roughly in half
each time. The process is analogous to a binary search algorithm in computer
science" (Wikipedia, 2006).
Literature:
Blegvad, M.;
Elberling, B. V.; Johnsen, E. & Rode, M. (Eds.). (1957). Videnskabens
kommunikationsproblemer. Nordisk Sommeruniversitet 1956. København:
Munksgaard.
Buckland, M. K.
(2005). Review of "The philosophy of Information" (Herold, ed., 2004).
Journal of Documentation, 61(5), 684-686.
Cawkell, A. E. (1990): The boundaries of information science:
information theory is alive and well. Journal of Information Science. Principle
& practice. 16(4), 215-216.
Cover, T. M. &
Thomas, J. A. (2006). Elements of Information Theory. 2nd ed. John Wiley.
(Wiley series in telecommunications).
Day, R. (2000).
The 'Conduit Metaphor' and The Nature and Politics of
Information Studies" Journal of the American Society for Information Science,
51(9), 805-811.
Jensen, J. F. (1990). Formattering af forskningsfeltet: Computer-Kultur &
Computer-Semiotik. IN: Computer-Kultur, Computer-Medier, Computer-Semiotik. Red.
af Jens F. Jensen. Ålborg: Nordisk Sommeruniversitet. (Pp. 10-50).
Kline, R. L. (2004).
What Is Information Theory a
Theory Of? Boundary Work among Information Theorists and Information Scientists
in the United States and Britain during the Cold War. IN: The History and
Heritage of Scientific and Technical Information Systems: Proceedings of the
2002 Conference, Chemical Heritage Foundation, eds., W. Boyd Rayward and
Mary Ellen Bowden. Medford, NJ: Information Today, 15-28.
http://www.chemheritage.org/events/asist2002/01-kline.pdf
Miksa, F. L. (1992). Library and information science: two paradigms.
IN: Conceptions of Library and Information Science. Historical, empirical and
theoretical perspectives. Ed. by Pertti Vakkari & Blaise Cronin. London: Taylor
Graham. (Pp. 229-252).
Qvortrup, L. (1993). The controversy over the concept of
information. An overview and a selected and annotated bibliography.
Cybernetics and Human Knowing, 1(4), 3-24.
Rapoport, A. (1956). The Promise and Pitfalls of Information Theory.
Behavioral
Science, 1(4), 303-309.
Schou-Christensen, J. (1984). Børsens edb-ordbog. København: Børsens
forlag.
Shannon, C. E. & Weaver, W. (1949/1964). The Mathematical Theory of Communication. Urbane:
University of Illinois Press, 1964. (Original: 1949).
Spang-Hanssen, H. (2001).: How to teach about
information as related to documentation. Human IT, (1), 125-143.
http://www.hb.se/bhs/ith/1-01/hsh.htm
Wersig, G.
(2003). Information theory. In J. Feather, & P. Sturges (Eds.), International
encyclopedia of library and information science (pp. 310319). London & New
York: Routledge.
Wikipedia, the
free encyclopedia.(2005). Information Theory.
http://en.wikipedia.org/wiki/Information_theory
Wikipedia. The free encyclopedia. (2006).
Twenty Questions.
http://en.wikipedia.org/wiki/Twenty_questions
See also: Information;
Information science, theory;
Information technology
Birger Hjørland
Last edited:
08-08-2006
Home

to be edited:
Edb-information måles som bekendt i bits:
Jensen, 1990, side 31: "Informationens mindsteenhed - informationsenheden -
defineres her arbitrært som "en to-valgssituation" (:9), dvs. en basal binær
opposition. Som eksempelvis: Ja/nej, tændt/slukket, relæets on/off,
åbent/lukket kredsløb etc. Udtrykkes det binære system i tal, gives der således
kun to cifre: 0 og 1. Og da de binære tal 0/1 kan siges symbolsk at repræsentere
alle typer af binære oppositioner, kaldes informationens mindsteenhed - efter
J. W. Turkey's abbreviation for binary digits -for en bit. Shannon skriver: "En
anordning med to faste positioner, f.eks. et relæ eller et flip-flop-kredsløb
kan lagre en bit af information. N sådanne anordninger kan lagre N bits" (:32).
Og det er netop dette princip af binære oppositioner, af 0'er og 1'er, af åbne
og lukkede kredsløb, som er selve komputersprogets fundament, selve det
teknisk-logiske princip bag hele computerteknologien. Weaver skriver: "de
ideer, der er udviklet i dette arbejde (dvs. den matematiske
kommunikationsteori, jfj.), forbinder sig... tæt til problemet omkring det
logiske design af store computere" (:25).
Indenfor informationsteknologien og dens matematisk-naturvidenskabelige grundfag
ligger ambitionen således i at installere informationsbegrebet og -videnskaben
indenfor et paradigme, der muliggør en praktisk-ingeniørmæssig bearbejdning, men
som derved - i al fald i første omgang - kommer til at se bort fra det, der for
en humanistisk-samfundsvidenskabelig og *BDI-faglig betragtning er det
centrale: indhold, betydning eller mening.
Weaver skriver i sin præsentationsartikel: "Ordet information i denne teori
bliver brugt i en speciel betydning, som ikke må forveksles med den normale
brug. Særlig må information ikke forveksles med indholdsmæssig mening".
En bit er den informationsmængde, der svarer til at man får oplyst rigtigheden
af to muligheder, hvilket der som bekendt er 50 % chance for at gætte rigtigt.
Hvis man kun har 25% chance for at gætte rigtigt, og får oplyst svaret, får man
tilført mere information: 2 bits o.s.v. Man kan sige, at kernen i dette
informationsbegreb ligger i en reduktion af modtagerens statistiske usikkerhed,
når han står overfor et antal definerede alternativer.
Det er karakteristisk, at man ikke her interesserer sig for informationens
nytte (det pragmatiske synspunkt) eller dens mening for brugeren (det
semantiske synspunkt), men kun for det syntaktiske aspekt, d.v.s. man
sammenfatter (gennem en såkaldt definition via abstraktion) en klasse
bestående af alle ekvivalente signaler, der kan overføres elektronisk/mekanisk
ved statistisk behandling. Således er informationsmængden i Shannons forstand
proportional med længden af et budskab (i en given kode).
I edb skelner man videre mellem *data og information. Hvor data er "en
formaliseret repræsentation af kendsgerninger, begreber eller instruktioner i en
form, der er egnet for overførsel, tolkning eller bearbejdning ved mennesker
eller automatiske hjælpemidler", så er information "det betydningsindhold, en
person tillægger en mængde af data. Data har således ikke noget entydigt
informationsindhold, idet tolkningen afhænger af den viden og de
forudsætninger, der er tilstede hos den/de personer, der anvender de
foreliggende data. Begreberne data og information benyttes af de fleste
kritikløst som synonymer..." (Schou-Christensen, 1984). Data har således klart
en mere objektiv, information en mere modtagerdefineret betydning, og denne
forskel er gennemgående i alle definitioner, der er inspireret af eller
påvirket fra informationsteorien.
In the beginning was Shannon's information theory received with great enthusiasm
in many contexts. This enthusiasm was not only visible in communitation and
information technology, in which domains it has it main strength, but also in
psychology, in the social sciences and in Library and Information Science (LIS),
in relation to problems associated with, for example, libraries and
literature searching. In 1956 was thus made a Scandinavian conference ("Nordisk Sommeruniversitet", NSU)
with the title (translated) "Problems in Scientific Communication", (Blegvad, 1957,
p. 13), in which was expressed the opinion, that thanks to Shannon's information
theory we have got a theoretical foundation for what is today termed LIS:
"Når NSU besluttede sig til at
tage hele dette problemkompleks op, var det ikke blot, fordi det var højaktuelt,
ej heller fordi det ikke før havde været behandlet i sin helhed, skønt disse to
forhold naturligvis spillede en væsentlig rolle. Men det afgørende var, at der
nu i modsætning til tidligere syntes at være håb om at få problemerne formuleret
i et hensigtsmæssigt begrebssystem. Det synspunkt, at disse tilsyneladende
ret forskelligartede problemer alle er kommunikationsproblemer hænger sammen
med udviklingen af en almen teori om kommunikation. Det er for meget sagt, at
en sådan teori foreligger idag; snarere gælder, at man med meget forskellige
udgangspunkter indenfor forskellige fag og discipliner har udviklet begreber og
teorier, der ser ud til at kunne samarbejdes til en almen kommunikationsteori.
Det drejer sig for det første om den såkaldte informationsteori." (Blegvad, 1957,
p. 13).
Man må nok indrømme, at denne informationsteori har været skuffelse i såvel
samfundsvidenskab som informationsvidenskab, selvom man ikke skal undervurdere
den gødning af grobunden, som den har været stærkt medvirkende til, ligesom man
heller ikke skal forklejne betydningen af nogle grundlæggende begreber og
synspunkter, der er så veletablerede, at de er blevet relativt
uproblematiserede baggrundsantagelser også i informationsvidenskaben. Også
f.eks. biblioteks- og dokumentationsforskningens navneskift til
"informationsvidenskab" og dens interesse for begrebet *information, skyldes -
på godt og ondt - hovedsageligt informationsteorien.
Skuffelsen skyldes dels teoriens nævnte begrænsning til det syntaktiske aspekt,
dels vanskeligheder ved overhovedet indenfor disse områder at kunne
tilvejebringe situationer, hvor forudsætningerne for anvende teorien kan
opfyldes: At definere situationer, hvor modtageren af data kan siges at være i
en veldefineret grad af statistisk usikkerhed, således at det giver mening at
måle den overførte information.
Rapaport (1956, side 17) anfører bl.a. følgende eksempel på
informationsteoriens vanskeligheder: (vor oversættelse) "Forudsæt eksempelvis,
at en person ikke ved, hvorvidt der eksisterer spøgelser. Så læser han en bog,
der præsenterer vægtige argumenter imod eksistensen af spøgelser. Som resultat
af læsningen bliver han overbevist om, at spøgelser ikke eksisterer. Hvor
meget information har denne person modtaget? Man kan "argumentere" på følgende
måde: Før læsning af bogen var vor person lige usikker på, om svaret på
spørgsmålet om spøgelsers eksistens var "ja" eller "nej". Efter at han har læst
bogen, er han sikker på, at svaret er "nej". Han har derfor modtaget een bit
information. Vi vil hævde, at svaret er utilfredsstillende. (Vi vil afstå fra
at rejse det samme spørgsmål i tilfælde af at personen ved læsning af en bog
bliver overbevist om, at spøgelser eksisterer)."
Spang-Hanssen (op. cit. p. 10) nævner som et andet problem eller eksempel, at
der ved videnskabelig litteratur helt åbenbart ikke er tale om at en artikels
længde er proportional med dens informationsindhold, og at der endog kan være
tale om at et abstracts er lige så informativt som hele artiklen.
Selvom der således knytter sig alvorlige problemer til Shannons m.fl.s
informationsteori som en omfattende teori om videnskabelig kommunikation, som
teoretisk referenceramme, så rummer teorien dog frugtbare (men afgrænsede)
muligheder også på dette område. Spang-Hanssen (1970) peger f.eks. på at tesauri
kan studeres udfra bl.a. informationsteoretiske referencerammer.
Det hovedresultat, vi må sige informationsteorien har beriget os med, er den
modtagerafhængige bestemmelse af informationsbegrebet.