Computational Linguistics and Intelligent Text Processing: by Leonardo Lesmo, Alessandro Mazzei, Daniele P. Radicioni

By Leonardo Lesmo, Alessandro Mazzei, Daniele P. Radicioni (auth.), Alexander Gelbukh (eds.)

This two-volume set, which includes LNCS 6608 and LNCS 6609, constitutes the completely refereed lawsuits of the twelfth overseas convention on laptop Linguistics and clever Processing, held in Tokyo, Japan, in February 2011. The seventy four complete papers, provided including four invited papers, have been rigorously reviewed and chosen from 298 submissions. The contents were ordered based on the next topical sections: lexical assets; syntax and parsing; part-of-speech tagging and morphology; be aware feel disambiguation; semantics and discourse; opinion mining and sentiment detection; textual content new release; computing device translation and multilingualism; info extraction and data retrieval; textual content categorization and type; summarization and spotting textual entailment; authoring reduction, errors correction, and elegance research; and speech reputation and generation.

Extra info for Computational Linguistics and Intelligent Text Processing: 12th International Conference, CICLing 2011, Tokyo, Japan, February 20-26, 2011. Proceedings, Part II

Example text

We will now give a formal description of our model to extract segment-pairs for clustering and template-induction. 1 Formal Description of the Model If the source sentence has S words (as in Fig. tτ , our goal is to define a probability model P ˆ between sS1 and tτ1 , and then find the best possible segment boundaries B ˆ S , tτ ) = arg max P (b|sS , tτ ) . tchkn , where m and n are random variables. ca represents alignments between the source and target chunks, wa represent alignments between the source and target words.

One of the first studies is made on bilingual co-occurrence pattern matching [3]. The context-vector is introduced in [12], relying on a bilingual lexicon. With the same philosophy, other works are done with a thesaurus [13]. This approach is the basis of the work presented in this paper. It is also possible to induce a seed-words lexicon from monolingual sources, as described in [14]. Other studies were made on the association measures between a term and its context in order to build the most accurate context-vector.

1 we present the fragment of the CCG for the lexical elements involved. Each element in the grammar has four categories: LEX, that contains the lexical form of the item; PoS, that contains the part of speech category; SynCAT, that contains the syntactic category; SemCAT, that contains the semantic category. Note that SynCAT e SemCAT are related by using semantic variables (xi and zj in Tab. 1): these 4 Note that ontological patterns could be written in terms of FOL predicates and, since Hybrid Logic is equivalent to a fragment of FOL, we could rewrite these FOL predicates in terms of hybrid logic, identifying first order variables with nominals of hybrid logic [12].

