advertisement

Connecting political data to media data

50 %
50 %
advertisement
Information about Connecting political data to media data
Technology

Published on February 19, 2014

Author: LauraHollink

Source: slideshare.net

Description

Presentation given at ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’ on February 18, 2014
advertisement

Connecting political data to media data Laura Hollink VU University Amsterdam Web & Media group ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’ February 18, 2014

Laura Hollink Damir Juric Geert-Jan Houben Funded by Clarin-NL Martijn Kleppe Max Kemman Henri Beunders Johan Oomen Jaap Blom

Questions we want to answer • Which events have attracted a lot of media attention? • What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins? • Has the coverage changed over time? • How are the events visualized (photos, layout of newspaper, etc.).

Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches.

Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995. (We only use 1945-1995)

Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Roughly 1.8 Million news bulletins between 1937-1984 (We only use 1945-1995) Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995. (We only use 1945-1995)

PoliMedia methods

Step 1: Translate the Dutch parliamentary debates to the standard structured web format RDF XML by War in Parliament Project Handelingen Verenigde Vergadering... Debate PartOfDebate DebateContext rdf:type rdf:type rdf:type 1945-11-20 dc:date Dutch dc:language nl.proc.sgd.d. 194519460000002 hasPart nl.proc.sgd.d. 194519460000002.1 hasPart nl.proc.sgd.d. 194519460000002.1.1 hasText "De voorzitter opent de vergadering…" dc:publisher dc:id http://statengeneraaldigitaal.nl/ dc:source nl.proc.sgd.d.19720000002 hasSubsequentPartOfDebate hasPart dc:source http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002 "Mijnheer de Voorzitter, de Commissie van …" member_of _parliament Speech nl.proc.sgd.d. 194519460000002.2 hasSpokenText hasRole rdf:type rdf:type http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf Joannes Antonius James Politician foaf:firstName Barge foaf:lastName nl.proc.sgd.d. 194519460000002.1.2 sem:hasActor hasSpeaker Speaker_0006 4 rdfs:label Barge dc:source coveredIn http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr hasSubsequentSpeech http://resolver.politicalmashup.nl/nl.m.00064 hasParty nl.proc.sgd.d. 194519460000002.1.3 Party Katholieke Volkspartij rdf:type hasFullName Party_kvp hasAcronym KVP

Modeling the debates as events • An event has a date, a location, actors, and possibly sub-events. • We build on the Simple Event Model (SEM). • links to the original sources • reusing existing vocabularies Handelingen Verenigde Vergadering... Debate dc:title 1945-11-20 rdf:type dc:date Dutch dc:language nl.proc.sgd.d. 194519460000002 dc:publisher dc:id http://statengeneraaldigitaal.nl/ dc:source nl.proc.sgd.d.19720000002 dc:source http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002 http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf

Handelingen Verenigde Vergadering... PartOfDebate rdf:type dc:title nl.proc.sgd.d. 194519460000002 hasPart DebateContext rdf:type nl.proc.sgd.d. 194519460000002.1 hasPart nl.proc.sgd.d. 194519460000002.1.1 hasText "Mijnheer de Voorzitter, de Commissie van …" hasSubsequentPartOfDebate hasPart Speech nl.proc.sgd.d. 194519460000002.2 rdf:type •the part-of structure and chronological order of the debates. "De voorzitter opent de vergadering…" nl.proc.sgd.d. 194519460000002.1.2 hasSubsequentSpeech nl.proc.sgd.d. 194519460000002.1.3 hasSpokenText

"Mijnheer de Voorzitter, de Commissie van …" Speech hasSpokenText rdf:type member_of _parliament Politician Joannes Antonius James hasRole rdf:type foaf:firstName Barge foaf:lastName nl.proc.sgd.d. 194519460000002.1.2 sem:hasActor coveredIn hasSpeaker Speaker_0006 4 rdfs:label Barge hasParty Party http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr Katholieke Volkspartij rdf:type hasFullName Party_kvp • the different roles and parties that a speaker can have in his/ her career. hasAcronym KVP

Step 2: Linking speeches in the debate to the newspaper articles that cover them We created a linking method to deal with our two challenges: 1.How to link documents that are so different in nature? 2. Can we use the structure of the debates: people, chronologic order of speeches, introductions to each new topic, etc? Name of speaker Date of debate Search newspaper archive Candidate articles Rank candidate articles Debates Detect topics in speeches Topics Create queries Detect Named Entities in speeches Named Entities Queries Links between speeches and articles

Step 2: Linking speeches in the debate to the newspaper articles that cover them Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate Name of speaker Date of debate Search newspaper archive Candidate articles Rank candidate articles Debates Detect topics in speeches Topics Create queries Detect Named Entities in speeches Named Entities Queries Links between speeches and articles

Step 2: Linking speeches in the debate to the newspaper articles that cover them Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate Name of speaker Date of debate Search newspaper archive Candidate articles Rank candidate articles Debates Detect topics in speeches Topics Create queries Detect Named Entities in speeches Named Entities Links between speeches and articles Queries Intuition 2: the more the article and the speech overlap in terms of topics and named entities, the more they are related.

Evaluation: what do we use to rank the candidate articles? • Experiment on 150 <newspaper article, speech in debate> pairs, 2 raters, K = 0.5 • Compare text of candidate articles to: • Setting 1: Named Entities in speech • Setting 2: Named Entities + Topics in speech • Setting 3: Named Entities + Topics in speech and larger part-of-debate Score Setting 1 Setting 2 Setting 3 I don’t know 0.14 0.15 0.08 0 - unrelated 0.38 0.23 0.12 1- related 0.29 0.36 0.36 2- explicit mention of the debate 0.19 0.26 0.44 1+2 0.62 0.80 0.48

Results • An open data set of Dutch parliamentary debates, • with almost 3 Million links between 450.000 speeches and URL’s of 1.5 Million news paper articles and radio bulletins at the National Library. • accessible though a Web demonstrator and through a SPARQL endpoint.

Demo

SPARQL endpoint • A service to query a knowledge base using the SPARQL query language. “All speeches with more than 60 associated news items.” SELECT ?speech ?no_newsitems {{ SELECT ?speech (COUNT(?news) AS ?no_news_items) WHERE{ ?speech <http://purl.org/linkedpolitics/nl/polivoc#coveredAt> ?news . } GROUP BY ?speech } FILTER (?no_news_items > 60) }

Reflection: to what extend can we answer these questions? • Which events have attracted a lot of media attention? • What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins? • Has the coverage changed over time? • How are the events visualized (photos, layout of newspaper, etc.).

Future work • More types of links • From just “coveredIn” to “quotedIn”, “coveredIn”, “backgroundOf” “talksAbout” • More types of media • More types of (political) events.

Project ‘Talk of Europe / Traveling Clarin Campus’ 2014-2015 Funded by CLARIN-ERIC From left to right: Max Kemman, Marnix van Berchum, Laura Hollink, Astrid van Aggelen, Steven Krauwer, Henri Beunders. (Unfortunately, Martijn Kleppe and Johan Oomen were not present to join the group pic.)

Plans of ‘ToE/TTC’ 1.Publish proceedings of the EU parliamentary debates in RDF • hosted by DANS 2.Organize 3 workshops/hackathons/‘Traveling Clarin Campuses’ in which we invite international partners to work with the data. 3.In collaboration with international partners: • enrich with annotations, e.g. topics, structured data about people, parties, etc. • link to national datasets, e.g. media or national parliaments

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

How to: Connect to Data in a SQL Server Express Database

You can establish communication between your application and a SQL Server Express database by creating a connection ... Connecting to Data in Visual ...
Read more

Data.com Connect - The right business connection is just a ...

The right business connection is just a click away. ... phone and title. Data.com Connect has the business contacts found nowhere else.
Read more

How to: Connect to Data in a Database

You can use Visual Studio to connect your application to a database. After creating the data connection, ... After creating the data connection, ...
Read more

PoliMedia use case: connecting political events to media data

The PoliMedia use case: connecting political events to media data Laura Hollink! CWI, Amsterdam Information Access group! Tool Criticism Workshop May 22, 2015
Read more

Create, edit, and manage connections to external data - Excel

Create, edit, and manage connections to external data. ... By using the Connection Properties dialog box or the Data Connection Wizard, ...
Read more

How does social media use influence political ...

How does social media use influence political participation and ... digital connectivity and political action, but the data remain ... media , Facebook ...
Read more

Connecting: how and when | Windows Phone How-to (United ...

... cellular data connection, or phone-to-computer ... your phone is configured to turn off your cellular data connection when roaming to avoid ...
Read more

How to Connect to Data Networks & Wireless Networks On ...

How to Connect to Data Networks & Wireless Networks On Windows Phone 7.8. Tutorial By ... slide the Data Connection to On and enable 3G Connection for the ...
Read more

media&data - Digitale Medien für Bildung und Unterricht

... Informationen und Medien e.V. prämiert die LOKANDO AG gemeinsam mit dem Landesmedienzentrum Tirol, der co.Tec GmbH und der media&data gmbh. ...
Read more