Language Analysis Essentials or LANES is a suite of open source Java modules for natural language processing and text mining. The toolkit includes modules for:

  • measurement of content bearingness of terms.
  • measurement of surface similarity of strings.
  • parsing sentences and extracting keyterms.
  • clustering of terms (with measurement of semantic relatedness of terms).
  • determine the stability of word sequences.
  • detecting classes of entities. the detection rate of this technique is about 95%, which outperforms state-of-the-art open-source and proprietary techniques such as Alchemy (73%), Illinois (13%), Stanford (13%) and Lexalytics (26%).
  • mapping terms to Wikipedia concepts*.
  • extraction of textual content from html pages*.

 

The LAnEs source code is released under the GNU GPL v3 License. Refer to the Java docs for more information. Some of the incorporated code and data fall under different licenses, all of which are GNU GPL compatible, as listed below:

 

*Not available currently

Last edited Apr 22, 2014 at 4:48 AM by wyswilson, version 11