Textass is a tool to help you working with text documents. The question here is: what information we will have from each document:
- Expected info
- Parsing distance
1) Expected info
Text meta info
- Number of charecters
- Number of words
- Number of paragraphs
- List of entities present in the text. For entity: name, event, place, date, language,...
- Use of this document for the users and the cosial stats
- Info coming from internet smashups, like wikipedia, blogosphere,... enriching the document context
2) Parsing distance
We gonna try a simple technique to stablish relationships in between annotations based on distance: same sentence, same paragraph, etc...