Linguistics researcher develops new system to help computers 'learn' natural language
- Date:
- August 24, 2013
- Source:
- University of Texas at Austin
- Summary:
- A linguistics researcher is developing new methods for helping computers 'learn' natural language.
- Share:
For more than 50 years, linguists and computer scientists have tried to get computers to understand human language by programming semantics as software. Now, a University of Texas at Austin linguistics researcher, Katrin Erk, is using supercomputers to develop a new method for helping computers learn natural language.
Instead of hard-coding human logic or deciphering dictionaries to try to teach computers language, Erk decided to try a different tactic: feed computers a vast body of texts (which are a reflection of human knowledge) and use the implicit connections between the words to create a map of relationships.
"An intuition for me was that you could visualize the different meanings of a word as points in space," says Erk, a professor of linguistics who is conducting her research at the Texas Advanced Computing Center. "You could think of them as sometimes far apart, like a battery charge and criminal charges, and sometimes close together, like criminal charges and accusations ("the newspaper published charges…"). The meaning of a word in a particular context is a point in this space. Then we don't have to say how many senses a word has. Instead we say: 'This use of the word is close to this usage in another sentence, but far away from the third use.' "
To create a model that can accurately recreate the intuitive ability to distinguish word meaning requires a lot of text and a lot of analytical horsepower.
"The lower end for this kind of a research is a text collection of 100 million words," she explains. "If you can give me a few billion words, I'd be much happier. But how can we process all of that information? That's where supercomputers come in."
Story Source:
Materials provided by University of Texas at Austin. Note: Content may be edited for style and length.
Cite This Page: