This appendix illustrates the grouping of the participation events data. Methodologically, I use a simple keyword search on all codings of a year. The original coding data can be retrieved by opening the IAEA.rqda and OPCW.rqda files with the RQDA software package.1

For the keyword classification, I extracted all ACT.Part codings from the files and copied them to text files for each year. The search is thus only performed on those parts of texts of the Annual Reports that I coded qualitatively as relevant statements about participation events.

The following packages were used for the analysis2:

library(tm)
library(slam)
library(dplyr)
library(ggplot2)
library(reshape2)
library(xtable)

IAEA

In the first step I create a corpus from the annual codings and pre-process the texts to remove stopwords, punctuation and upper case letters. Next, I create a document term matrix, which includes the freuqnecy of each term in each document. The matrix is then used to extract relevant search terms.

corpus <- Corpus(DirSource("../data/corpora/iaea-part-events/", encoding="UTF-8"),
            readerControl=list(language="en"))
corpusVars <- data.frame(var1=factor(rep("", length(corpus))),
            row.names=names(corpus))
dtmCorpus <- corpus
dtmCorpus <- tm_map(dtmCorpus, content_transformer(tolower))
dtmCorpus <- tm_map(dtmCorpus, content_transformer(function(x)
            gsub("(['’\n]|[[:punct:]]|[[:space:]]|[[:cntrl:]])+",
                     " ", x)))
dtmCorpus <- tm_map(dtmCorpus, removeNumbers)
dtm <- DocumentTermMatrix(dtmCorpus, control=list(tolower=FALSE,
                    wordLengths=c(2, Inf)))
rm(dtmCorpus)
dictionary <- data.frame(row.names=colnames(dtm),
                "Occurrences"=col_sums(dtm),
                "Stopword"=ifelse(colnames(dtm) %in% stopwords("en"),
                "Stopword", ""),   stringsAsFactors=FALSE)
dtm <- dtm[, !colnames(dtm) %in% stopwords("en")]
attr(dtm, "dictionary") <- dictionary
rm(dictionary)
meta(corpus, type="corpus", tag="language") <-
    attr(dtm, "language") <- "en"
meta(corpus, type="corpus", tag="processing") <-
    attr(dtm, "processing") <- c(lowercase=TRUE, punctuation=TRUE,
                    digits=TRUE, stopwords=TRUE, stemming=FALSE,
                    customStemming=FALSE, twitter=FALSE,
                    removeHashtags=NA, removeNames=NA)
corpus
## <<VCorpus>>
## Metadata:  corpus specific: 2, document level (indexed): 0
## Content:  documents: 55
dtm
## <<DocumentTermMatrix (documents: 55, terms: 7986)>>
## Non-/sparse entries: 48860/390370
## Sparsity           : 89%
## Maximal term length: 26
## Weighting          : term frequency (tf)

In the second step, I first collect all search terms that still may have different orthography. Second, I group them together according to the overarching topics of Science, Training, and Advise.

terms <- as.data.frame(as.matrix(dtm))
terms$Year <- 1957:2011

## combine relevant terms
terms$WORKSHOP <- terms$workshop + terms$workshops +
    terms$workshopsí
terms$SEMINAR <- terms$seminar + terms$seminarí +
    terms$seminars +terms$seminarsã
terms$TRAINING <- terms$training + terms$trainingí
terms$MEETING <- terms$meeting + terms$meetingís +
    terms$meetings
terms$COURSE <- terms$course + terms$courses
terms$PANEL <- terms$panel + terms$panelonthe +
    terms$panels
terms$CONSULTANT <- terms$consultant +
    terms$consultants
terms$SYMPOSIA <- terms$symposia + terms$symposium
terms$NETWORK <- terms$network + terms$networki +
    terms$networkís + terms$networks
terms$ADVISOR <- terms$advisor + terms$advisory

## create term categories
terms$GROUP_SCIENCE <- terms$SEMINAR + terms$PANEL +
    terms$SYMPOSIA
terms$GROUP_TRAINING <- terms$TRAINING + terms$COURSE +
    terms$WORKSHOP
terms$GROUP_ADVICE <- terms$MEETING + terms$CONSULTANT +
    terms$NETWORK + terms$ADVISOR


write.csv(terms, file = "coding_terms.csv")

terms2 <- terms %>% select(Year, WORKSHOP, SEMINAR, TRAINING,
                        MEETING, COURSE, PANEL, CONSULTANT, SYMPOSIA,
                        NETWORK, GROUP_SCIENCE, GROUP_TRAINING,
                        GROUP_ADVICE)
write.csv(terms2, file = "iaea-participation-events.csv")
Year WORKSHOP SEMINAR TRAINING MEETING COURSE PANEL CONSULTANT SYMPOSIA NETWORK GROUP_SCIENCE GROUP_TRAINING GROUP_ADVICE
1957 0 3 22 4 0 3 5 2 0 8 22 12
1958 0 8 19 17 14 12 1 12 0 32 33 25
1959 1 5 18 14 25 39 0 12 0 56 44 14
1960 0 2 12 11 12 29 1 15 0 46 24 14
1961 0 7 8 15 10 9 3 10 0 26 18 19
1962 0 4 6 20 3 15 4 10 0 29 9 26
1963 0 1 15 21 10 23 1 10 0 34 25 22
1964 0 4 13 26 7 29 1 16 0 49 20 34
1965 0 8 14 19 9 20 4 13 0 41 23 25
1966 0 1 17 5 12 12 0 14 0 27 29 5
1967 0 3 15 17 12 21 3 15 0 39 27 20
1968 0 3 9 8 9 23 5 16 0 42 18 13
1969 0 2 8 18 8 22 7 12 0 36 16 25
1970 0 7 15 23 13 21 2 14 0 42 28 27
1971 1 6 17 25 12 17 9 10 0 33 30 36
1972 1 4 4 23 3 22 5 17 0 43 8 28
1973 3 5 8 28 7 11 1 15 0 31 18 30
1974 3 7 18 26 10 7 4 15 1 29 31 45
1975 2 12 17 24 8 1 7 20 2 33 27 55
1976 5 5 9 40 9 0 7 5 4 10 23 70
1977 3 5 11 29 13 0 5 13 2 18 27 53
1978 7 7 19 19 13 0 4 13 1 20 39 32
1979 6 7 16 19 10 1 1 12 2 20 32 30
1980 3 8 22 8 17 0 1 7 2 15 42 14
1981 8 17 33 25 26 0 11 13 1 30 67 44
1982 14 10 45 41 44 0 13 10 1 20 103 71
1983 14 18 61 48 41 2 13 6 5 26 116 84
1984 2 15 41 58 38 1 17 14 8 30 81 102
1985 20 11 65 46 47 0 18 12 5 23 132 89
1986 20 14 60 69 45 1 26 12 5 27 125 121
1987 38 18 75 46 53 1 11 15 4 34 166 82
1988 26 17 86 90 57 1 21 11 5 29 169 138
1989 10 5 43 61 21 1 3 14 0 20 74 86
1990 11 13 52 79 20 1 10 16 6 30 83 117
1991 14 9 45 78 24 2 17 14 2 25 83 121
1992 14 5 36 107 12 1 22 13 9 19 62 154
1993 15 8 43 105 22 2 20 9 2 19 80 154
1994 10 5 37 86 10 1 17 6 3 12 57 123
1995 7 12 23 36 7 0 0 12 2 24 37 45
1996 2 6 20 32 6 0 2 5 6 11 28 59
1997 3 3 17 39 6 1 2 13 5 17 26 71
1998 15 10 24 63 9 0 5 14 6 24 48 90
1999 28 8 43 70 24 1 3 15 10 24 95 105
2000 12 8 38 31 16 2 1 9 6 19 66 52
2001 22 8 71 36 29 2 2 5 8 15 122 54
2002 20 4 68 42 22 1 4 10 9 15 110 64
2003 20 4 46 24 28 0 0 2 17 6 94 43
2004 20 7 41 39 14 5 0 2 14 14 75 56
2005 16 5 65 24 25 1 1 6 12 12 106 43
2006 24 6 46 28 19 1 0 1 6 8 89 39
2007 27 6 68 43 30 0 0 4 16 10 125 64
2008 27 4 47 25 32 1 0 3 11 8 106 38
2009 35 5 60 48 33 1 0 10 18 16 128 73
2010 26 5 65 37 36 2 1 6 14 13 127 59
2011 34 10 70 55 36 1 0 1 24 12 140 83

OPCW

Again, in the first step I create a corpus from the annual codings and pre-process the texts to remove stopwords, punctuation and upper case letters.

corpus <- Corpus(DirSource("../data/corpora/opcw-part-events/", encoding="UTF-8"),
                    readerControl=list(language="en"))
corpusVars <- data.frame(var1=factor(rep("", length(corpus))),
                    row.names=names(corpus))
dtmCorpus <- corpus
dtmCorpus <- tm_map(dtmCorpus, content_transformer(tolower))
dtmCorpus <- tm_map(dtmCorpus,
    content_transformer(function(x)
    gsub("(['’\n]|[[:punct:]]|[[:space:]]|[[:cntrl:]])+", " ", x)))
dtmCorpus <- tm_map(dtmCorpus, removeNumbers)
dtm <- DocumentTermMatrix(dtmCorpus,
                    control=list(tolower=FALSE, wordLengths=c(2, Inf)))
rm(dtmCorpus)
dictionary <- data.frame(row.names=colnames(dtm),
                    "Occurrences"=col_sums(dtm),
                    "Stopword"=ifelse(colnames(dtm) %in% stopwords("en"),
                    "Stopword", ""),   stringsAsFactors=FALSE)
dtm <- dtm[, !colnames(dtm) %in% stopwords("en")]
attr(dtm, "dictionary") <- dictionary
rm(dictionary)
meta(corpus, type="corpus", tag="language") <-
    attr(dtm, "language") <- "en"
meta(corpus, type="corpus", tag="processing") <-
    attr(dtm, "processing") <- c(lowercase=TRUE, punctuation=TRUE,
                    digits=TRUE, stopwords=TRUE, stemming=FALSE,
                    customStemming=FALSE, twitter=FALSE,
                    removeHashtags=NA, removeNames=NA)
corpus
## <<VCorpus>>
## Metadata:  corpus specific: 2, document level (indexed): 0
## Content:  documents: 15
dtm
## <<DocumentTermMatrix (documents: 15, terms: 2238)>>
## Non-/sparse entries: 7251/26319
## Sparsity           : 78%
## Maximal term length: 24
## Weighting          : term frequency (tf)

In the second step, I first collect all search terms that still have different orthography. Second, I group them together according to the overarching topics of Science, Training, and Advise.

terms <- as.data.frame(as.matrix(dtm))
terms$Year <- 1997:2011

## combine relevant terms
terms$WORKSHOP <- terms$workshop + terms$workshops + terms$workshopin
terms$SEMINAR <- terms$seminar + terms$seminars +terms$seminarfrom
terms$TRAINING <- terms$training
terms$MEETING <- terms$meeting + terms$meetings
terms$COURSE <- terms$course + terms$courseswere +
    terms$coursebefore + terms$coursefor + terms$courses +
    terms$courseswere
terms$PANEL <- terms$panelists
terms$SYMPOSIA <- terms$symposium
terms$NETWORK <- terms$network
terms$ADVISOR <- terms$advisory + terms$adviser

## create term categories
terms$GROUP_SCIENCE <- terms$SEMINAR + terms$PANEL + terms$SYMPOSIA
terms$GROUP_TRAINING <- terms$TRAINING + terms$COURSE + terms$WORKSHOP
terms$GROUP_ADVICE <- terms$MEETING + terms$NETWORK + terms$ADVISOR


write.csv(terms, file = "coding_terms_opcw.csv")

terms2 <- terms %>% select(Year, WORKSHOP, SEMINAR, TRAINING,
                    MEETING, COURSE, PANEL, SYMPOSIA, NETWORK,
                    GROUP_SCIENCE, GROUP_TRAINING, GROUP_ADVICE)
write.csv(terms2, file = "opcw-participation-events.csv")
Year WORKSHOP SEMINAR TRAINING MEETING COURSE PANEL SYMPOSIA NETWORK GROUP_SCIENCE GROUP_TRAINING GROUP_ADVICE
1997 1 4 6 1 10 0 0 0 4 17 2
1998 3 9 4 4 9 0 5 2 14 16 11
1999 7 8 22 8 25 0 3 3 11 54 15
2000 18 3 22 12 21 0 0 5 3 61 19
2001 13 2 10 12 8 1 1 2 4 31 17
2002 6 3 12 9 17 0 0 1 3 35 12
2003 9 3 6 8 6 0 0 6 3 21 16
2004 12 1 9 7 9 0 0 4 1 30 14
2005 13 1 11 5 10 0 0 0 1 34 8
2006 12 3 7 7 14 0 0 0 3 33 9
2007 11 0 9 4 9 0 0 0 0 29 6
2008 13 2 12 9 11 0 0 0 2 36 11
2009 16 3 15 17 19 0 0 1 3 50 20
2010 12 4 20 12 27 0 0 1 4 59 15
2011 11 9 14 8 14 0 0 1 9 39 11

  1. HUANG, Ronggui (2014). RQDA: R-based Qualitative Data Analysis. R package version 0.2-7. URL http://rqda.r-forge.r-project.org/.

  2. David B. Dahl (2014). xtable: Export tables to LaTeX or HTML. R package version 1.7-4. http://CRAN.R-project.org/package=xtable. Ingo Feinerer and Kurt Hornik (2014). tm: Text Mining Package. R package version 0.6. http://CRAN.R-project.org/package=tm. Kurt Hornik, David Meyer and Christian Buchta (2014). slam: Sparse Lightweight Arrays and Matrices. R package version 0.1-32. http://CRAN.R-project.org/package=slam. Hadley Wickham and Romain Francois (2014). dplyr: A Grammar of Data Manipulation. R package version 0.3.0.2. http://CRAN.R-project.org/package=dplyr. Hadley Wickham (2009) ggplot2: elegant graphics for data analysis. Springer New York, 2009. Hadley Wickham (2007). Reshaping Data with the reshape Package. Journal of Statistical Software, 21(12), 1-2.