Polish Word Association Network (SSSJP)

The Free Word Association

Rather early on, it was noted that words in the human mind are linked. The American psychiatrists G. Kent and A. J. Rosanoff (1910) perceived the diagnostic usefulness of an analysis of the links between words. This duo created and conducted a test of the free association of words. They conducted research on 1,000 people of varied educational backgrounds and professions, asking their research subjects to give the first word that came into their minds as a result of a stimulus-words. Those researched were supplied with 100 word-stimuli, (principally nouns and adjectives). The Kent-Rosanoff list of words was translated into several languages, in which this experiment was repeated, thereby enabling comparative research to be carried out. Word association research was continued by Palermo, Jenkins (1964), Postman, Keppel (1970), Kiss, Armstrong, Milroy, Piper (1973), Moss, Older (1996), Nelson, McEvoy, Schreiber (1998), and the repeatability of results allowed the number of research subjects to be reduced, while at the same time increasing the number of word-stimuli to be employed, for example 500 research subjects and 200 words (Palermo, Jenkins 1964), or 100 research subjects and 8400 words (Kiss, Armstrong, Milroy, Piper ,1973). Increasing the number of words used in these tests resulted in the fact that the last experiment was conducted for many years, but the experiment permitted the creation of a thesaurus of the English language, the Edinburgh Associative Thesaurus (EAT). Research on the free association of words has also been conducted in Poland. in the years 1964/65 by I. Kurcz. In this experiment, the stimuli list was composed of 100 words from the Kent-Rosanoff list, and responses were gathered from 1000 students from a University and a Polytechnic. A compilation of results as well as a short introductory article were published in the VII volume of “Studia Psychologiczne” (Studies in Psychology) in 1967.

Word Association Network

The goal of this project is to experimentally develop a word association network for the Polish language. Where network is a structure built from lexical nodes and relations, as in the figure below, which presents a network for the word dom, built in the pilot study (Gatkowska, 2013, 2014).

dom-mały
(Click the picture to enlarge)

The set of relations defines the meaning of a lexical unit. The path in the network may explain, how we can derive information, which is not lexically present in a sentence, e.g. in the dialog: Aunty I have got a terrier! – It is really nice, but you have to care of the animal. A network structure should also be investigated in its relation to text – it has been proven that semantic associations derived automatically from a large text collection contains only a small fraction of the network derived from humans by the free word association experiment, see (Rapp at all 2005, Rapp, 2002, 2008), (Wandmacher, 2005, 2008), (Gatkowska et al, 2013 Gatkowska 2014).

We use a free word association test to develop a network. The free word association test produces a set of stimulus - response pairs. It is well known (Clark, 1971) that such a list consists of responses, which are semantically related to the stimulus, responses which reflect pragmatic dependencies and so-called ‘clang responses’.

If we look at the set of semantically related responses (associations) one can find more frequent direct associations, i.e. such as those which follow a single semantic relation, e.g. ‘whole – part’: house – wall and not so frequent indirect associations like: mutton (baranina) – horns (rogi), which must be explained by a chain of relations, in our example: ‘source’ relation mutton (baranina) – ram (baran), followed by a ‘whole – part’ relation ram (baran) – horns (rogi) or the association: mutton (baranina) – wool (wełna), explained by a ‘source’ relation mutton (baranina) – ram (baran), followed by a ‘whole – part’ ram (baran) – fleece (runo), which is followed by a ‘source’ relation’ fleece (runo) – wool (wełna). These association chains suggest that some associations are based on a semantic network. But to study such associations one needs a rich association network. To obtain a rich word association network we have to modify the original shape of the free word experiment, which means we have to divide the experiment into phases: to start with an initial set of stimuli and use associations (responses) as stimuli in the next phase, and so on.

The Word Association Test

1. The premises

First, we have to test an appropriate number of human subjects . We shall follow the original principle of the Kent-Rosanoff research, we shall test at least 1000 subjects. To make the results comparable to Kurcz’s (1967) experiment we shall test only university students.

Second, we have to divide the experiment into phases. In the first phase we shall use 60 words from the Kent-Rosanoff list translated by Kurcz (1976) as a stimuli list. In the second phase we shall use as stimuli the 5 most frequent associations to each of the 60 stimuli used in the first phase – which brings a total of 300 stimuli. In the third and final phase we shall use the 3 most frequent associations for each stimulus used in phase two, which brings a total of 900 stimuli. If the average number of associations to a specific stimulus, after researching 1,000 people, amounts to 150, than with 60 stimuli in the first phase, 300 in the second, and 900 in the third phase, we will obtain a network comprised of 189,000 associations.

2. The Supervised, Computer Assisted Experiment

The experiment will be conducted in a computer lab, with the aid of a computer system, which has been created specifically for the requirements of this experiment. This system presents a list of stimuli and then writes down associations in a data base.

- Instructions will appear on the computer screens of each participant, which in addition are read aloud by the person conducting the experiment, before any research commence. Therefore an attempt will be made to formulate the instructions for the participants, so that they would be clear, concise, brief and contain the most vital information (including fonts, the size of letters, and Polish letters as well).

- After the instructions are read, the experiment will commence, whereby a stimulus will appear on the computer screen of each participant, and he will write the first free association word which comes to his mind. When the participant wrote down his association, (or the time ran out for his to write down his association), the next stimulus will appear on his screen, until the experiment was concluded.

- The number of stimuli-words as well as their order are the same for all participants.

- The response time is crucial for the test result – too short will produce too many ‘clang responses’, too long may bring too many elaborated indirect and pragmatic associations. If one uses a computer monitor to present a stimulus and a computer keyboard to write down an association one has to set the response time limit empirically. This time limit was set empirically in the pilot study, analyzing the behavior and the opinions of the group being researched, in the pilot program – a group composed of students majoring in an interdisciplinary subject - Electronic Information Processing at the Jagiellonian University. It was assumed that the participants would have a similar level of linguistic training and a corresponding fluency when writing on a computer keyboard (Gatkowska, 2013).

- The investigator will be present at a computer lab during the experiment to preserve compatibility to the predecessor’s research (Kurcz, 1967).

Network samples

NOUN:


baranina (mutton), chleb (bread), głowa (head), jedzenie (food), krzesło (chair), księżyc (moon), lampa (lamp), praca (work), ptak (bird), ręka (hand), woda (water), żołnierz (soldier)

ADJECTIVE:


biały (white), ciężki (heavy, difficult), czerstwy (stale), duży (big), głęboki (deep)

VERB:


ciąć (cut), palić (burn, smoke), płynąć (swim, sail, flow)

Selected Bibliography

Amancio, Diego Rafael , Oliveira, Osvaldo N. Jr , da FontouraCosta, Luciano, 2012, Using complex networks to quantify consistency in the use of words, Journal of Statistical Mechanics: Theory and Experiment, 2-20.

Borge-Holthoefer, Javier, Arenas, Alex , 2009, Navigating word association norms to extract semantic information, [In:] Taatgen N., van Rijn H.(eds.), Proceedings of the 31st Annual Conference of the Cognitive Science Society, Groningen, 2777-2782.

Borge-Holthoefer, Javier, Arenas, Alex, 2010, Categorizing words through semantic memory navigation, The European Physical Journal B-Condensed Matter and Complex Systems 74(2), 265.

Budanitsky, Alexander , Hirst, Graeme, 2006, Evaluating wordnet-based measures of lexical semantic relatednes. [In:] Computational Linguistics 32.1, 13-47.

Church, Kenneth W., Hanks, Patrick, 1990, Word Association Norms. [In:] Mutual Information, and Lexicography. Computational Linguistics, vol. 16, 1, 22-29.

Clark, Herbert H.,1970, Word Associations and Linguistic Theory. [In:] J. Lyons (ed.) “New Horizon in Linguistics”, Middlesex: Penguin Books Ltd, Harmondsworth, 271-286.

De Deyne, Simon, Storms, Gert, 2008, Word associations: Network and semantic properties.[In:] Behavior Research Methods, 40 (1), 213-231.

Fillmore, Charles J., 1976, Frame semantics and the nature of language.[In:] Annals of the New York Academy of Sciences: Conference on the Origin and Development of Language and Speech. 280, 20-32.

Fillmore, Charles J. ,1982, Frame semantics, Linguistics in the Morning Calm. [In:] Seoul, South Korea: Hanshin Publishing Co., 111-137.

Fillmore, Charles J., Baker, Collin F., and Sato, Hiroaki, 2004, FrameNet as a Net. [In:] Proceedings of LREC. Vol. 4. Lisbon: ELRA, 1091-1094.

Gatkowska, Izabela, Korzycki, Michał, Lubaszewski, Wiesław, 2013, Can Human Association Norm Evaluate Latent Semantic Analysis? [In:] Proceedings of the NLPCS Workshop, Marseille, 92-104.

Gatkowska, Izabela, 2013, Przetwarzanie informacji językowej. Podstawy kognitywne. [ In:] Gatkowska I., Lubaszewski W. (eds.) „Interfejs dla osób z dysfunkcją wzroku. Model kognitywny i przykład dobrej praktyki”, Kraków: Wydawnictwo Uniwersytetu Jagiellońskiego, 9-45.

Gatkowska, Izabela, 2014, Word Associations as a Linguistic Data [In:] P.Chruszczewski, J.Rickford, K. Buczek, A. Knapik, J. Mianowski (eds.), Languages in Contact 2012, Vol.1, Wrocław, 79-92.

Kent, Grace H., Rosanoff, Aaron J., 1910, A study of association in insanity. [In:] American Journal of Insanity 67 (37-96), 317-390.

Kiss, George R., Armstrong, Christine., Milroy, Robert, Piper, James, 1973, An associative thesaurus of English and its computer analysis. [In]: Aitken, A.J., Bailey, R.W. (eds.) „The Computer and Literary Studies”. Edinburgh: University Press, 153-165.

Kurcz, Ida, 1967, Polskie normy powszechności skojarzeń swobodnych na 100 słów z listy Kent-Rosanoffa. [In]: (ed.) T. Tomaszewski, Studia Psychologiczne, vol. VIII, Wrocław-Warszawa-Kraków, 122- 255.

Lyons, John , 1972, „Structural Semantics. An Analysis of Part of the Vocabulary of Plato”, Oxford: Basil Blackwell.

Lubaszewski, Wiesław, Gatkowska, Izabela, 2013, Struktura semantyczna języka naturalnego. [In:] Gatkowska I., Lubaszewski W. (eds.) „Interfejs dla osób z dysfunkcją wzroku. Model kognitywny i przykład dobrej praktyki”, Kraków: Wydawnictwo Uniwersytetu Jagiellońskiego, 47-106.

Miller, George A., Beckwith, Richard, Fellbaum, Christiane, Gross, Derek, Miller, Katherine, 1990, Introduction to WordNet: an on-line lexical database. [In]: International Journal of Lexicography. 3 (4), 235 - 244.

Minsky, Marvin, 1975, A Framework for Representing Knowledge. [In:] P. H. Winston, McGraw-Hill (eds.) “The Psychology of Computer Vision”. New York: McGraw-Hill, 211-277.

Moss, Helen, Older, Lianne, 1996, “Birkbeck word association norms”, Psychology Press, London: Erlbaum Taylor& Francis Ltd,.

Palermo, David S., Jenkins, James J., 1964, “Word Association Norms: Grade School through College”, Minneapolis: University of Minnesota Press.

Postman, Leo Joseph, Keppel, Geoffrey, 1970, “Norms of word association”, New York: Academic Press.

Rapp, Reinhard , 2013, From Stimulus to Associations and Back. [In:] Proceedings of the NLPCS Workshop, Marseille, 2013, 78-91.

Rapp, Reinhard, 2008, The Computation of Associative Responses to Multiword Stimuli. [In:] Proceedings of the workshop on Cognitive Aspects of the Lexicon (COGALEX2008): Coling 2008, Manchester, 102–109.

Rapp, Reinhard, 2002, The Computation of Word Associations: Comparing Syntagmatic and Paradigmatic Approaches. [In:] Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, Taipei, 1-7.

Rosenzwieg, Mark R., 1961, Comparisons among word-association responses in English, French, German, and Italian. [In:] Amer. Journal Psychol.Vol.64, 347-360.

Rosenzwieg, Mark R., 1957, Etudes sur l'association des mots.[In:] L'Annee Psychol., Vol.57, 23-32. Russell, Wallace A., Meseck, O.R. ,1959, Der Einfluss der Association auf das Erinnern von Worten in der deutschen, franzosischen and englischen Sprache.[In:] Zeitschrift für Experimentelle und Angewandte Psychologie, Vol.6, 191-211.

Schank, Roger C., 1972, Conceptual Dependency: A Theory of Natural Language Understanding. [In:] Cognitive Psychology, Vol. 3, 552-631.

Schank, Roger C.,1975, “Conceptual Information Processing”, Amsterdam : North-Holland.

Schulte im Walde, Sabine, Borgwaldt, Susanne, Jauch, Ronny, 2012, Association Norms of German Noun Compounds. [In:] Proceedings of the 8th International Conference on Language Resources and Evaluation. Istanbul, 632-639.

Sinopalnikova, Anna, Smrz, Pavel, 2004, Word Association Thesaurus as a Resource for extending Semantic Networks.[In:]  Proceedings of the International Conference on Communications in Computing, CIC '04, Las Vegas, Nevada, USA, 267-273.

Sowa, John F., 2006, Semantic Networks. [In]: Encyclopedia of Cognitive Science. New York: John Wiley & Sons Ltd.

Wandmacher, Tonio, 2005, How semantic is Latent Semantic Analysis. [In:] Proceedings of TALN/RECITAL 5 , Dourdan, 6-10.

Wandmacher, Tonio, Ovchinnikova, Ekaterina, Alexandrov, Theodore, 2008, Does Latent Semantic Analysis reflect human associations. [In:] Proceedings of the ESSLLI Workshop on Distributional Lexical Semantics. ESSLLI 2008 , Hamburg, 63 -70.

Wettler, Manfred, Rapp, Reinhard, Sedlmeier, Peter, 2005, Free word associations correspond to contiguisties between words in text. [In:] Journal of Quantitative Linguistics, 12(2/), 111– 122.