gate workshop mar2004

100 %
0 %
Information about gate workshop mar2004

Published on November 16, 2007

Author: Willi


Slide1:  GATE technical workshop: introduction Hamish Cunningham Sheffield, March 17/18, 2004 Slide2:  Wednesday (G22) 10.15: arrival, setup 10.30: introductions, summary of background / skills 10.40: mission, conventions, internal pages, GATE intro (hc) 11.30: tools: cvs, jbuilder, tkdiff, building GATE (vt) 12.00: break 12.15: intro to the GUI (dm) 1.30: lunch 2.30: annie, jape (dm) 4.00: break 4.15: summary of projects (hc) 5.30: close Agenda Thursday (G30) 10.30: API, CREOLE lifecycle, java for jape [1] (vt) 12.00: break 12.15: tests, writing, running; API etc. [2] (hc, vt) 1.30: lunch 2.30: corpora, evaluation tools (dm, kb) 3.00: machine learning (vt) 4.00: break 4.15: ontologies (kb) 5.15: wrapup 5.30: close Slide3:  mission conventions mailing lists roles and responsibilities Blah GATE (the Volkswagen Beetle of Language Processing) is::  GATE (the Volkswagen Beetle of Language Processing) is: Eight years old (!), with 000s of users at 00s of sites An architecture A macro-level organisational picture for LE software systems. A framework For programmers, GATE is an object-oriented class library that implements the architecture. A development environment For language engineers, computational linguists et al, a graphical development environment. Some free components... ...and wrappers for other people's components Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue; ontologies; etc. Free software (LGPL). Download at A bit of a nuisance (our users):  A bit of a nuisance (our users) GATE team projects. Past: Conceptual indexing: MUMIS: automatic semantic indices for sports video MUSE, cross-genre entitiy finder HSL, Health-and-safety IE Old Bailey: collaboration with HRI on 17th century court reports Multiflora: plant taxonomy text analysis for biodiversity research e-science EMILLE: S. Asian language corpus ACE / TIDES: Arabic, Chinese NE JHU summer w/s on semtagging Present: Advanced Knowledge Technologies: €12m UK five site collaborative project ETCSL: Sumerian digital library MiAKT: medical informatics / AKT SEKT: Semantic Knowledge Tech PrestoSpace: AV Preservation KnowledgeWeb; h-TechSight Thousands of users at hundreds of sites. A representative sample: the American National Corpus project the Perseus Digital Library project, Tufts University, US Longman Pearson publishing, UK Merck KgAa, Germany Canon Europe, UK Knight Ridder, US BBN (leading HLT research lab), US SMEs: Melandra, SG-MediaStyle, ... Imperial College, London, the University of Manchester, UMIST, the University of Karlsruhe, Vassar College, the University of Southern California and a large number of other UK, US and EU Universities UK and EU projects inc. MyGrid, CLEF, dotkom, AMITIES, CubReporter, Poesia... Slide6:                                                                                                                              Architectural principles Non-prescriptive, theory neutral (strength and weakness) Re-use, interoperation, not reimplementation (e.g. diverse XML support, integration of Protégé, Jena, Weka...) (Almost) everything is a component, and component sets are user-extendable (Almost) all operations are available both from API and GUI All the world’s a Java Bean....:  All the world’s a Java Bean.... CREOLE: a Collection of REusable Objects for Language Engineering: GATE components: modified Java Beans with XML configuration The minimal component = 10 lines of Java, 10 lines of XML, 1 URL Why bother? Allows the system to load arbitrary language processing components Slide8:  NOTES everything is a replaceable bean all communication via fixed APIs low coupling, high modularity, high extensibility NOTES (2) eg: Protégé LR & VR both wrapped in Res. (bean) API ontology repositories and inference should be the same: KAON + Sesame + Orenge + ? GATE APIs Onto- logy Protégé Onto- logy Word- net Gaz- etteers Language Resource Layer (LRs) ... Slide9:  Happy Birthday Valy!

