Tom Juzek,
Assistant Professor,
Department of Modern Languages and Linguistics,
Florida State University

"The Syntactic Acceptability Dataset as a resource for machine learning and linguistic analysis"

October 26, 2022, Schedule:

Nespresso & Teatime ( 417 DSL - Commons )
03:00 to 03:30 PM Eastern Time (US and Canada)

Colloquium - F2F (  499 DSL ) / Virtual ( Zoom )
03:30 to 04:30 PM Eastern Time (US and Canada)

Meeting # 942 7359 5552


Linguistic datasets are popular in machine learning, particularly in the emerging field of few shot learning (learning from limited data), as linguistic data is often complex and difficult to generalize from, and thus a welcome challenge (Wang et al. 2020). In this talk, I will outline ongoing research on building a new dataset valuable to both the machine learning community and the linguistic community. The new dataset will be based on COLA (Corpus of Linguistic Acceptability; Warstadt et al. 2018), a popular dataset in machine learning. I will briefly introduce COLA, the challenges it poses, and relevant linguistic distinctions (acceptability vs grammaticality). Further, I will motivate the need for new data, a different kind of data, outline its structure, and its expected relevance to machine learning and linguistics.

Download this file (theywilltalkdata.jpg)theywilltalkdata.jpg[Advertisement]334 kB
Download this file (Tom Juzek.jpg)Tom Juzek.jpg[Advertisement]28 kB