All Workshops will take place Thursday, October 17
Extending ELAN into Variationist Sociolinguistics
Naomi Nagy (University of Toronto) & Miriam Meyerhoff (University of Auckland)
ELAN (tla.mpi.nl/tools/tla-tools/elan, Wittenburg et al. 2006) has established itself as a valuable tool for language documentation and is frequently used for transcription and multi-tier mark-up illustrating levels of linguistic structure as well as translations and glosses. In this workshop, we illustrate an extension to its utility: extracting and coding tokens of linguistic variables for quantitative analysis in the variationist sociolinguistic framework. This approach improves consistency, reliability and accountability of our coding to the original recording. In this workshop we illustrate the following benefits:
- seamless connections between recording, transcript, and coding of the dependent variable (response) and independent variables (predictors). This facilitates revision and intercoder reliability tests.
- reuse of contextual factor coding (e.g., style, topic, interlocutor) as well as some structural (morphological, syntactic) tags
- wide-ranging exportability (to Excel, R, Rbrul, Goldvarb, Praat, SPSS, …)
- importability of transcripts from Word/text files and many other transcription formats (text, Transcriber, Shoebox/Toolbox, CHAT, Praat …)
- complex searches and concordance capabilities to speed up token extraction
- archivability of all mark-up related to each data file in a consistent and small-file-size format
We will provide a how-to demonstration, illustrating ELAN’s functionality for variationist research with ongoing work on variation in subject pronoun presence in Faetar and N’kep, endangered languages spoken in southern Italy and Vanuatu (respectively). The hands-on portion of the workshop will allow participants to code a brief excerpt in English for a well-known sociolinguistic variable and/or work on their own data.
Participants should bring a laptop with ELAN installed (download from http://tla.mpi.nl/tools/tla-tools/elan – Mac, Windows and Linux versions available) and an (unzipped) copy of the workshop files, available at http://projects.chass.utoronto.ca/ngn/pdf/NWAV2013_ELAN.zip.
Ethnography for Sociolinguistics: Beyond “Hanging-Out”
Laura C. Brown (University of Pittsburgh)
Ethnography and participant observation are sometimes used to describe ways in which linguists learn simply by spending time with the people they study. While ethnographic fieldwork affords opportunities to encounter surprise, adapt to the unexpected, and allow study participants to guide research, it is also a methodology that researchers must design, plan, and structure to fit with research questions. Drawing on a variety of recent studies, this workshop examines methods for the collection and analysis of ethnographic data. We suggest ways of designing and critiquing ethnography as an active, focused, and structured mode of research practice.
Specific topics include: What sorts of practices and scales can examine? How can ethnographic methods be applied to linguistics research questions? What forms does participant observation take? How should ethnography be situated in time and space? How should we assess the significance and representativeness of ethnographic observations? What methods are available to record, analyze, and present ethnographic data? What issues must linguists consider when conducting ethnography in “difficult” field situations? Workshop discussions will aid participants in planning ethnographic research; specifying methods on grant applications; and analyzing previously collected ethnographic data.
Integrating traditional dialectology and sociolinguistics: generalized additive modeling
Harald Baayen (University of Alberta & University of Tübingen) & Martijn Wieling (University of Tübingen)
Traditional dialectology (i.e. dialect geography) and sociolinguistics can be seen as two streams of a unique and coherent discipline: modern dialectology (Chambers and Trudgill, 1998). In practice, however, dialectology and sociolinguistics remain relatively separate fields when considering the methods and techniques used for analyzing language variation and change. Sociolinguistics generally focuses on a small number of linguistic variables, while assessing the influence of various social factors, whereas traditional dialectology mainly investigated the geographical distribution of linguistic variation.
Dialectometry originated in the 1970s (Séguy, 1971) as a more objective, quantitative method to investigate the geographical distribution of linguistic variation by aggregating over a large number of linguistic variables. Given its quantitative nature, dialectometry would have been an excellent candidate to integrate traditional dialectology and sociolinguistics. Unfortunately, however, dialectometrists have generally disregarded the influence of social factors and almost exclusively focused on the geographical distribution of linguistic variation.
In this hands-on workshop we will illustrate a new method for dialectologists, generalized additive mixed modeling (Wieling, Nerbonne and Baayen, 2011; Wieling, 2012), which combines the merits of all three strands of research: traditional dialectology, sociolinguistics, and dialectometry. Specifically, the method integrates the geographical distribution of linguistic variation together with the influence of social factors, while simultaneously assessing these relationships in an objective way across a large set of linguistic variables. In contrast to the dialectometrical approach, however, the results also allow a focus on individual linguistic variables, as well as on individual speakers.
During the workshop, participants will experiment with this method themselves by applying it to various real-life datasets provided by the workshop organizers. Therefore, it is recommended that participants bring their own laptop having the most recent version of R installed, as well as the packages ‘mgcv’, ‘languageR’, ‘rms’ and ‘lme4’. In preparation for the workshop, participants are advised to familiarize themselves with Baayen (in press), available at http://www.sfs.uni-tuebingen.de/~hbaayen/publications/baayenRML2012.pdf.
Baayen, R. H. (in press). Multivariate Statistics. In R. Podesva and D. Sharma, Research Methods in Linguistics. Cambridge, Cambridge University Press. http://www.sfs.uni-tuebingen.de/~hbaayen/publications/baayenRML2012.pdf
Chambers, J. and Trudgill, P. (1998). Dialectology. Cambridge University Press, Cambridge, 2nd edition.
Séguy, J. (1971). La relation entre la distance spatiale et la distance lexicale. Revue de Linguistique Romane, 35(138):335–357
Wieling, M., Nerbonne, J., and Baayen, R. H. (2011). Quantitative social dialectology: Explaining linguistic variation geographically and socially. PLOS ONE, 6(9):e23613.
Wieling, M. (2012). A Quantitative Approach to Social and Geographical Dialect Variation. PhD dissertation, University of Groningen. Digitally available at http://www.martijnwieling.nl
Sharing, Structuring and Processing Data
Chris Cieri (University of Pennsylvania), Malcah Yaeger-Dror (University of Arizona), Brian MacWhinney (Carnegie Mellon University), and Maxine Eskenazi (Carnegie Mellon University).
Research communities are enriched when the data that individual researchers collect is shared. The goal of the workshop is to discuss how the wealth of data that this community has collected can be shared. For data to be shared, it must be given some usable form. It also needs to be annotated. The workshop will address in broad terms and in detail the steps we can take to share our data.
The four parts of this workshop will include:
1) Data Sharing
- A discussion of the advantages and challenges of linguistic data sharing
- What are the specific challenges for sociolinguistics in terms of the field’s early focus on speech community studies and the natural trend toward comparing speech communities/studying things other than speech communities?
- What practices promote sharing?
- Examples of some recent case studies in sociolinguistic data sharing: the publication process and the afterlife of language data
2) Variables and Coding
- What do we not yet code for appropriately?
- Dividing ethnicity and coding separately for Regional and Religious identity
- Do we want to code via SES? and if so, whose SES?
- Coding early vs. coding later on – are we losing information?
3) Examples of data sharing in TalkBank
- TalkBank has developed the largest publicly available corpus of spoken language data. Together with LDC, we are hoping to extend our database and methods to include additional sociolinguistic corpora.
- The use of TalkBank methods to transcribe and analyze sociolinguistic data
4) Crowdsourcing for speech processing
- What is crowdsourcing, where can we access it?
- Quality control: can we actually get acceptable quality output?
- Gathering data using crowdsourcing
- Annotating data using crowdsourcing
Discourse Analysis for Variationists
Barbara Johnstone (Carnegie Mellon University)
The first hour of this workshop will provide an overview of the heuristic for discourse analysis that is described in Johnstone, Discourse Analysis, illustrated with examples of research by sociolinguists. (People who have read that book do not need this workshop.) In the process, we will touch on major schools of research in the study of discourse, including Critical Discourse Analysis, Interactional Sociolinguistics, Conversation Analysis, and Systemic Functional Linguistics. During the second hour of the workshop, participants will work with texts and transcripts to get a feel for what discourse analysis can be like. A bibliography of suggestions for further reading will be provided.
Best Practices in Sociophonetics
Marianna DiPaolo (University of Utah)
This workshop continues the discussion of best practices in sociophonetics begun at NWAV33. The ever-expanding range of knowledge necessary to do high quality work in the interdisciplinary field of sociophonetics demands that we provide quick access to the best methodological, technical, and procedural information to all researchers. This workshop consists of two mini-workshops:
Quantifying and interpreting vowel formant trajectory information
Alicia Beckford Wassink (University of Washington)
Chris Koops (University of New Mexico)
The growing range and sophistication of sociophonetic analysis techniques (Thomas 2011, Di Paolo & Yaeger-Dror 2011) affords variationist sociolinguists previously unparalleled opportunities to represent and study fine phonetic detail. In this workshop, we review the theoretical motivations as well as the practical costs and benefits of exploiting one type of detailed phonetic representation: the vowel formant contour.
Studies of diphthongization and monophthongization processes have profitted from treating vowels as temporal phenomena with multiple points of comparison. While traditional measures, such as glide length, slope and direction, continue to be used, new techniques are emerging that allow trajectories to be compared holistically, without prioritizing particular acoustic landmarks. Such techniques include dedicated statistical methods such as Smoothing Spline ANOVA (SSANOVA, Baker 2006, Nycz & De Decker 2006). SSANOVA, introduced in this workshop, may help to clarify sociolinguistic settings in which the “same” vowel exists in both monophthongal and diphthongal forms (e.g., Hall-Lew 2004, 2005; Oxley 2009; Koops 2010; Hinrichs et al. 2013), and how communities differ in the number and location of temporal targets (Thomas 2011).
We will also ask: How does normalization apply to trajectory information? How can we ensure that more phonetic detail illuminates, rather than obscures, phonological patterns?
The Phonetics of Code-switching
Barbara E. Bullock (University of Texas at Austin),
Almeida Jacqueline Toribio (University of Texas at Austin) & Daniel Olson (Purdue)
In zones of high contact, such as borders, urban centers, and post-colonial societies, language mixing in the form of code-switching and borrowing is ubiquitous. Since such mixed forms are so prevalent in bilingual speech, we need to find ways to harness this type of data and analyze it in an empirically robust and systematic manner. Here, we advocate for the benefits of the sociophonetic study of language mixing.
In this segment of the workshop, we will overview the myriad factors that need to be considered when examining sociolinguistic variation in the speech of bilingual speakers. In particular, we focus on the necessity to operationalize the notion of language mode, which theorizes the ways in which bilingual speakers move along a continuum of more monolingual to more bilingual behaviors. We overview different practices for examining the phonetic reflexes of bilingual mode and elaborate on the discourse-pragmatic and identity work that may be indexed phonetically when speakers code-switch.
We conclude with a research agenda for the study of the sociophonetics of code-switching that examines the three main types of switching— insertion, alternation, and tag switching— and how they relate to identity formation, stance, and authenticity in a bilingual context.