Multilingual and Multimodal Environmental Information Production
The overall objective is to research and develop technologies for the automatic generation of user-tailored material (reports, suggestions, recommendations, etc.) in the language of the preference of the user.
The task of environmental information production (or generation) is central to the achievement of PESCaDO’s objectives since the content that PESCaDO aims to communicate to the users will usually stem from different web sources. Each source may have its own format, level of abstraction, and even language. This information is be unified and presented to the user as a coherent and cohesive whole. Also, the content in PESCaDO is represented in ontologies and their associated knowledge base in terms of the formal RDF/OWL notation, and decision support operates on them to assess the relevance of the content units for a specific inquiry/problem of the user and to select the relevant chunks. As a consequence, the content considered relevant needs to be catered as information to the user by information production techniques.
For the first prototype of PESCaDO, three components of the information production module have been developed: the content selection component, the discourse structuring component and the information generation component.
- The content selection component selects from the knowledge base the content that is to be communicated to the user in order to comply with the problem description and reasoning outcome. The selection strategy is still largely driven by interaction with human experts (environmental specialists and municipal counselling services): for each generic problem identified for the user profiles maintained in PESCaDO, the experts determine what kind of content under which conditions is appropriate.
- The discourse structuring component organizes the selected content as a coherent “story” (rather than a sequence of unrelated messages or pictures) in that it defines discourse relations between chunks of content and establishes an order between them. In the first prototype, the discourse structuring is schema-driven in that it draws upon a number of predefined discourse patterns that proved suitable for presentation of the environmental information.
- The information generation component transforms the selected and organized content into visual and textual information; in the case of textual information, a multilevel multilingual rule-based generator is used.
…
Because of the moderate wind, there will be a lot of alder pollen in the air.
Kohtalaisen tuulen johdosta ilmassa on runsaasti lepän siitepölyä.
In the next phrase of the project, a new content selection strategy will be incorporated. The new strategy will be based on reinforcement learning and will thus be able to adjust the selection of the content better to the needs of the user. Instead of the rule-based linguistic generator, at least for English, a statistical generator will be incorporated. The annotation of corpora needed for this has already been accomplished to a large extent. A further focus will be the integration of visual and textual information generation, such that explanation of the information illustrated in pictures or tables will become possible.
CONTACT:
Leo Wanner <leo.wanner@upf.edu>
Nadjet Bouayad-Agha <nadjet.bouayad@upf.edu>