Sorina I. Crisan, PhD
Human Coding, Machine-Learning Techniques & Discourse Analysis: Interview with David Sylvan, PhD
Are you interested to learn about current research employing both human coding and machine-learning (ML) techniques, within the social sciences? If so, in this short interview, you will be able to read about Dr. David Sylvan’s reasons for undertaking such a methodological approach within his own research and the benefits/dis-benefits of utilizing human coding versus ML techniques to conduct discourse analysis within the field of international relations/political science. Dr. Sylvan argues that: “with the advances in computational linguistics (in terms of representing syntactic, semantic, and discursive features of texts, but also in terms of incorporating ML), the time is now ripe for taking the next step and learning how to get at some of the intricacies of the things that human beings do with words.” The article concludes with Dr. Sylvan’s views about the dissemination of ideas, academic publishing, and his advice to junior scholars.
Interview by Sorina I. Crisan, PhD
Dr. David Sylvan is a Professor of International Relations/Political Science at the Graduate Institute of International and Development Studies, in Geneva, Switzerland. He is the Co-Founder and Co-Principal Investigator on the Intrepid Project. Dr. Sylvan's expertise and research focus on the following themes: the Cold War, foreign policies, military occupation, NATO and alliance relations, state-building, transatlantic relations, and urban questions.
Congratulations on your valuable work towards co-founding and co-leading the Intrepid Project, which is funded by the Swiss National Science Foundation (SNSF) and is now in its second year. According to the website, the project aims to “develop a general understanding of how policy announcements by state agencies are interpreted by journalists in ways that send signals, indicate intent, and otherwise provoke economic and political reactions.” The project is meant to “develop models of the announcement interpretation process.” It uses both human coders and machine learning techniques to highlight particular features mentioned in policy announcement texts and their respective journalistic accounts. The project covers two themes (central bank monetary policy and foreign policy), it analyzes two case studies (the United States and France), and it investigates a specific time frame (i.e., from the late 1960s to 2018). Looking at the Intrepid Project, what do you perceive to be the benefits versus the dis-benefits of employing human coding vs. machine learning (ML) techniques (i.e., NVivo, Python, R, TagTog, etc.) when it comes to discourse/textual analysis?
I don’t see it as either/or. We want to fuse the two: use human coding as an input to machine learning, thereby (we hope) getting the best of both worlds.
When and why did you become interested in combining human coding with machine learning techniques for analyzing discourse/text? And, how do you see the nature of international relations (IR) and political science (PS) related discourse/text analysis changing in the near future, due to the rapid technological advances taking place within the field of machine learning?
When I got frustrated by just how long it took us to do human coding of parliamentary speeches, in the Garrison State Project. The idea was to scale up and deal with much larger quantities of text, and the Intrepid project is one start at that. We are using some fairly simple coding (simple for human beings) of newspaper articles (who what when where why, etc.), and if that works, we’ll try to move on to much harder sorts of texts.
On the second question, political science (and hence IR) and the social sciences more generally have now discovered text. For the last decade or so, there have been a whole lot of “push the button” analyses of texts based on word frequency; but with the advances in computational linguistics (in terms of representing syntactic, semantic, and discursive features of texts, but also in terms of incorporating ML), the time is now ripe for taking the next step and learning how to get at some of the intricacies of the things that human beings do with words.
When conducting discourse analysis, within the field of IR/PS, do you think there are perhaps some misunderstandings around the ideas of employing: (1) quantitative versus qualitative methods and (2) human coding versus machine coding techniques? It would be interesting to learn more about your views regarding the use and the purpose of each method.
That’s the understatement of the year. The vast majority of quantitative data sets in political science start as collections of textual materials (for example, news reports, readings of legislative materials), even if those materials weren’t necessarily collected using Schutzian lifeworld principles, and therefore are not qualitative in the strong meaning of the term. By the same token, there’s a very old tradition in ethnography and qualitative sociology of taking carefully coded observations and analyzing their frequency and covariation using statistical methods. So to see the use of numbers as some kind of ontological or epistemological dividing line is deeply mistaken.
As regards human coding vs machine learning, as I mentioned above, ML techniques are trained (the “learning” part of ML) on datasets, and there are enormous numbers of such datasets that involve human coding. What we’re trying to do in our project is to train on more complex texts coded by human beings.
Prior to the aforementioned project you also created and led the Garrison State Project, which was a 5-years long SNSF funded research, focused on: “examining the expansion of what political scientist Harold Lasswell called 'garrison states': developed democracies in which organizations concerned with issues of national security grow in size, become more active, and are less and less subject to oversight.” This project utilized human coders, which is a lengthy process that requires the use of well-trained coders, a lot of patience and persistence, etc. What were some of the biggest accomplishments achieved by this project and, in retrospect, do you believe that this project could have benefited from employing machine learning techniques? And, lastly, is it correct to assume that you were able to learn from your achievements and the challenges encountered on this project to thereafter build on them and co-create the Intrepid Project?
We of course learned an awful lot about how democracies tend to stultify, with national security thinking colonizing parliamentary discourse in such a way that the policy space shrinks drastically. This is true, I might add, in every single democracy, including those, like Switzerland, that are not members of military alliances and which never had a single colony.
On the methodological side, we were able to develop a set of techniques for ensuring high degrees of intercoder reliability on data sources (namely texts) and coding techniques (open-ended, bottom-up coding) that are normally seen as completely incompatible with such criteria. I do not believe we were in a position to use ML techniques then, and still do not believe they’re possible, because what our coders were doing was really deeply interpretive: figuring out the illocutionary, and in some cases the perlocutionary, force of particular passages in parliamentary speeches. Work on this aspect of text has not yet advanced that far.
Finally, on the academic disciplinary side, our work sheds some light on the issue of argumentation, and I would like to write more explicitly on that in the near future, not least because I’ve been teaching a course in our interdisciplinary master’s program about policy debates.
As to your final question, we absolutely were able to build on what we learned (about human coding, about inter-coder reliability, about dealing with texts) to come up with, and to carry out, the Intrepid project.
During our work together, while you were my PhD thesis director at the Graduate Institute, we have had countless conversations on the topic of publishing one’s work within the academic field. Over the length of your distinguished IR/PS career, you have done countless interviews and have published an impressive number of articles, books, book chapters, reviews, blog posts, etc. Would you like to share some of your views about the importance of having one’s work published or maybe even portrayed in a video/audio format? And, are there perhaps some specific personal traits that junior scholars should try to develop, at an early stage in their career, in order to be successful published authors and/or interviewees for various media outlets?
I think publishing is essential for disciplinary success, even if, as we all know, publishing outlets are beset by gate-keeping-ism. But the dissemination of ideas is a different matter, which only overlaps partially with academic publication. As regards video or audio formats, I still know very little about them and am hoping to learn more in the years to come.
Would you like to share any remarks and/or suggestions for junior scholars interested in following a similar line of research such as yours?
Take a chance. Play with ideas and if you’re passionate about them, that passion should come through in your work.
Thank you for reading.
#interview #internationalrelations #politicalscience #discourseanalysis #humancoding #machinelearnign #ML #publishing #ideasdissemination #foreignpolicy #academia #economics #economicpolicy #journalism #encouragingwords #sharingresearch #teaching #learnfromoneanother
David Sylvan, PhD
International Relations/Political Science
The Graduate Institute of International and Development Studies, Geneva
Co-Founder & Co-Principal Investigator
Last, but not least, when asked about where interested readers may find his work online, Dr. Sylvan answered the following:
I hope to set up a website at some point. For now, some of my papers have been uploaded to project sites, and others can be obtained from convention sites, or by writing to me and my collaborators. Monographs are a different story.
Are you inspired and would like to learn more about the topics covered in this interview? For a quick view, here is a small selection of Dr. Sylvan's articles, presentations, conference papers, and book.
Conference papers and presentations on the theme of machine-learning:
A Hybrid Text Analysis Method for Comparing Contextual Specificity in Political Documents
David Sylvan, Jean-Louis Arcand, Ashley Thornton
The Annual Meeting of the American Political Science Association, 30 September – 3 October 2021.
Modeling the Interpretation of Policy Announcements: Russia, the Federal Reserve, and the New York Times
David Sylvan, Jean-Louis Arcand, Ashley Thornton
The 11th Annual Meeting of the European Political Science Association, 24 – 25 June 2021.
Russia, the Federal Reserve, and the New York Times: Machine-Learning Models of How Elites Interpret Policy Announcements
David Sylvan, Jean-Louis Arcand, Ashley Thornton
The 62nd Annual Convention of the International Studies Association, 6 – 9 April 2021.
Article and book on the topic of U.S. foreign policy:
David Sylvan, “The US history of collecting – and dropping – client states is grim,” August 18, 2021. Responsible Statecraft.
David Sylvan and Majeski, S. (2009). U.S. Foreign Policy in Perspective: Clients, Enemies and Empire. London: Routledge.
Illustrations by: The main article photo made available on Unsplash, courtesy of Wix.com photo gallery. The profile photo used on this page is made available on Dr. Sylvan's profile page with the Graduate Institute, Geneva.
Now it is your turn!
Like and share this interview with your community. And, let us know your thoughts, in the comments section below.