The Linguistica Project

Introduction

In the years since 2003, several groups of students have worked with me on developing versions of Linguistica code. The central work behind Linguistica is a set of algorithms for determining the morphology of a natural language with no prior knowledge of the language. As our work has developed, there have been other functionalities that were natural to include in the package.

Linguistica 4 is a project written in Qt 3, developed from 2004 to 2010. There's a nice photo below of the group in the spring of 2006, including Yu Hu, Colin Sprague, and Aris Xanthos. Between 2007 and 2010, a number of improvements were made by Jonathan Nieder, Sravana Reddy, and Sonjia Waxmonsky.

Linguistica 5 is a project written in Python 3 (well, some in Python 2.7 too) which is quite different from Linguistica 4 in its approach to the problem of morphology discovery. Jackson Lee and Anton Osten contributed greatly to this project, and Jackson has created an excellent GitHub site devoted to it: click HERE to go to the Github page.

The Linguistica group at the University of Chicago draws its membership from the Department of Linguistics and the Department of Computer Science. Our core interest is unsupervised learning of natural language structure, but this interest has taken us to work in a number of other areas, including automatically obtaining corpora through the Internet, and the discovery of structure in bioinformatic databases.

This site contains a good number of details about the Linguistica project and its supporting projects at the University of Chicago. By using the navigation menu on the left, you can learn more about the Linguistica project, download the latest version of the program and source code, read related papers about the theory involved, meet the group members, and navigate to other Natural Language Processing resources on the Web.

Enjoy!

Check out the new executable downloads on the downloads page!

Other photos