Unitex/GramLab


Master Project Proposals - 2016

Unitex/GramLab is an open source, cross-platform, multilingual, lexicon- and grammar-based corpus processing suite. Unitex/GramLab releases are avalaible here: http://releases.unitexgramlab.org

Unitex/GramLabcontain is an open source, cross-platform, multilingual, lexicon- and grammar-based corpus processing suite. Project decision-making is based on a community meritocratic process. Anyone with an interest in Unitex/GramLab can join the community, contribute to the project design and participate in decisions.

Unitex

Unitex is the C++ Natural Language Processing (NLP) engine of Unitex/GramLab. It is distributed under the terms of the GNU Lesser General Public License version 2.1 (LGPLv2) and contain little third-party code dependencies (LibYAML, Pstdin, TRE, WinGetOpt) licensed under more-permissive licenses.

GramLab

GramLab is the Project-oriented integrated development environment (IDE) of Unitex/GramLab. There is also a Classic IDE (Unitex.jar) that we are currently looking to integrate with GramLab (see project PRJ-02 below). They are distributed under the terms of GNU Lesser General Public License version 2.1 (LGPLv2) and contains only few third-party code (XAlign, Xerces2-j) licensed under equal or more-permissive licenses.

Linguistic resources

Linguistic resources released with Unitex/GramLab are distributed under the terms of the Lesser General Public License For Linguistic Resources (LGPLLR). For authors and more information on these linguistic resources, see the respective linguistic resource package.

Documentation

User’s Manual (in PDF format) is available in English and French (more translations are welcome). You can view and print them with Evince, downloadable here. The latest on-line version of the User’s Manual is accessible here.

Support

Support questions can be posted in the community support forum. Please feel free to submit any suggestions or requests for new features too. Some general advice about asking technical support questions can be found here.

Reporting Bugs

See the Bug Reporting Guide for information on how to report bugs.

Governance Model

Unitex/GramLab project decision-making is based on a community meritocratic process. Anyone with an interest in it can join the community, contribute to the project design and participate in decisions. The Unitex/GramLab Governance Model describes how this participation takes place and how to set about earning merit within the project community.

Spelling

Unitex/GramLab is spelled with capitals “U” “G” and “L”, and with everything else in lower case. Excepting the forward slash, do not put a space or any character between words. When the forward slash is not allowed, you can simply write “UnitexGramLab”

It’s common to refer to the Unitex/GramLab Core as “Unitex”, and to the Unitex Project-oriented IDE as “GramLab”. If you are mentioning the distribution suite (Core, IDE, Linguistic Resources and others bundled tools) always use “Unitex/GramLab”.

Useful links

Main website http://unitexgramlab.org
Binary releases http://releases.unitexgramlab.org
User’s manual http://releases.unitexgramlab.org/latest-rc/man
Users forum http://forum.unitexgramlab.org
Developers list unitex-devel@univ-mlv.fr
Code hosting https://gforgeigm.univ-mlv.fr/projects/unitex
http://code.unitexgramlab.org(We are now migrating to GitHub)
Your contribution Contribution rules
Governance http://governance.unitexgramlab.org

How to start ?

Thank you for your interest in contributing with Unitex/GramLab development! You could start downloading a binary release here and getting familiar with Unitex/GramLab. The User’s Manual is available here.

Unitex/GramLab source code is hosted on https://gforgeigm.univ-mlv.fr/projects/unitex. An overview of the C++ Core code (v3.0) is reachable here. For an overview of the Java IDE (v3.0) you could check this presentation. There are also some contribution rules here

To start hacking the code, checkout the sources with Subversion:

C++ Core:

$ svn checkout --username anonsvn --password anonsvn https://svnigm.univ-mlv.fr/svn/unitex/Unitex-C++

To compile under Linux use e.g.:

$ cd build
$ make SYSTEM=linux-like 64BITS=yes DEBUG=yes UNITEXTOOLLOGGERONLY=yes

Java Classic IDE:

$ svn checkout --username anonsvn --password anonsvn https://svnigm.univ-mlv.fr/svn/unitex/Unitex-Java

To compile ant test under Linux use e.g.:

$ pushd /home/YOUR_USER/Downloads
$ wget http://unitex.univ-mlv.fr/releases/3.1rc/source/Unitex-GramLab-3.1rc-source-distribution.zip
$ unzip Unitex-GramLab-3.1rc-source-distribution.zip
$ popd
$ export UNITEX_BUILD_RELEASE_DIR=/home/YOUR_USER/Downloads/Unitex-GramLab-3.1rc
$ ant
$ cp dist/* "$UNITEX_BUILD_RELEASE_DIR/App"

Java GramLab IDE (depends upon Unitex.jar):

$ svn checkout --username anonsvn --password anonsvn https://svnigm.univ-mlv.fr/svn/unitex/GramLab
$ ant

Note: Alternatively, on Linux or OS X, you can download this script to checkout and build the IDEs.

The goal of this evaluation module is to compare two sets of annotations: one of them serves as a reference, the other is the output of a system to be evaluated.

  • Export the two set of annotations to a custom standoff format (C++ | YAML or CSV)
  • Efficiently align and compare the annotations (C++)
  • Count the number of matched relations which are: Correct, Missing, False positive, Partially correct (C++)
  • Calculate metrics over matches: micro and macro values of precision, recall and F-measure (C++)
  • Integrate the module into the GramLab IDE (Java)

Note: Some years ago, a former student developed a Perl script named SBDiffTool, a Sentence Boundary Visual Diff Tool for Unitex. A short time later, another student developed several Perl scripts (CiteExtract, CiteDiff, CiteEval) to compare two annotations sets, one set produced by Unitex and other manually labeled by an human. These scripts, which were developed for a very specific class of annotation, could serve as a starting point for build a more flexible and integrated tool.

Mentor: Eric Laporte

This projects aims to convert some parts of the Unitex/GramLab IDE into plugins.

  • Thus, users will be able to activate or deactivate functionality according to their needs and projects.
  • It will be easier to implement and propose variants of existing plugins.

Co-Mentors: Eric Laporte, Cristian Martinez

A package manager is a tool that makes installing, upgrading, uninstalling, configuring and managing packages easy. Popular application-level package managers are:

We wish to provide a Unitex Package Manager (UPM) for Language Resources, i.e. a tool to install, upgrade and uninstall dictionnaires, grammars or a group of language-related resources

Co-Mentors: Eric Laporte, Cristian Martinez




Last modification on this page: November 09, 2016

University Paris-Est Marne-la-Vallée    IGM    Our NLP team Unitex/GramLab forum