advertisement

CS 595 Presentation

44 %
56 %
advertisement
Information about CS 595 Presentation
Entertainment

Published on October 17, 2007

Author: Danielle

Source: authorstream.com

advertisement

Classifying Gender on Shakespeare’s Characters:  Classifying Gender on Shakespeare’s Characters By Sobhan Advisor: Dr. Argamon Outline:  Outline Introduction to Problem Data Collection, Meta Data Generation, Feature Description File Selection Importing Data, Generating ARFF File - ATMan Vector Calculation ML Algorithms Used for this Classification Problem Experiments and Results Top - Bottom 20 Features – Responsible for Gender Classification Machine/OS/Tools Used Future Work - References Introduction:  Introduction Research in Gender Classification – Email Authorship, Written Text, Authorship on Novels Do Male/Female playwright writes the same way for their Male/Female characters into their plays or they writes in different manner ? Finding Accuracy on Gender Classification for Shakespeare’s Characters Features used by Character Gender from Plays Finding Accuracy on Social Class Classification for Shakespeare’s Characters Data Collection :  Data Collection Version used : Moby Shakespeare Available at: http://www-tech.mit.edu/Shakespeare/ Collected all HTML files using “wget” Class used(html2txt): Converted html files to text files for each individual play and also based on scenes Data Cleaning :  Data Cleaning Unwanted data were removed from each scene exeunt Exit Meta Data Generation:  Meta Data Generation Meta Data: Data about Data For each character acting on the play has the following 6 information to be captured. Data about a Character Type of Play: Comedy Name of the Play: Midsummer Night’s Dream Name of the Character: CLOWN Speech Length: 1024.0 Gender: Male Social Class: Low Corpus Selection:  Corpus Selection Initially All Scenes were selected. Speech Length for each character was added to Metadata and then the following selection were made Characters with more than 100, 200, 300, 400, 500 speech length were taken into consideration. (For scenes, acts and on Play) Separates files per character were created for more than 500, 200 Features File Selection:  Features File Selection Most Frequent 500 Words from Plays (FDescMostFrequentAttr - Sterling) Function Words( Standard FWs from Bar Ilan University - #471) Function Words Collected from ARFF received from Bar-Ilan (#364) Shakespearean Function Words from Plays(# 491) All Stop Words (#645) Appraisal Features(#47) Systemic Features(#94) System Architecture:  System Architecture Corpus ATMAN Importer ImportShakespeareData ARFF FILE Cdesc ATXT TOKEN Atxt, Token Fdesc ATMAN QuickARFF A Meta-Info Tag from an Atxt File:  A Meta-Info Tag from an Atxt File Vector Calculation:  Vector Calculation C(w,c) = # of occurrences of FW w for character c N(c) = total # of word occurrences for character c (number of tokens) Vector_Value(w) = is then C(w,c)/N(c) Algorithms:  Algorithms Decision Trees J48 Decision Stump Functions SMO e-1 SMO e-2 Rules PART Meta AdaBoostM1 + J48 (- 30 I) AdaBoostM1 + DecisionStump(- 30 I) MultiBoost + J48 (- 30 I) MultiBoost + DecisionStump(- 30 I) Experiments:  Experiments Strategy Used: 10 different partitions on each of the following categories. Experiments were made with Total Female characters with equal number of Random Male characters All Comedy History Tragedy High Low Testing Option – 10 Fold CV All & Comedy – MF - 500:  All & Comedy – MF - 500 Tragedy & History - MF - 500:  Tragedy & History - MF - 500 High & Low - MF - 500:  High & Low - MF - 500 Bar-ILan FWs(#471):  Bar-ILan FWs(#471) 364 FWs for Characters with Speech Length more than 100 – Acts Based :  364 FWs for Characters with Speech Length more than 100 – Acts Based 364 FWs Characters with speech length> 500:  364 FWs Characters with speech length> 500 364 FWs + Quote Features Characters with Speech Length > 500:  364 FWs + Quote Features Characters with Speech Length > 500 BAR-ILAN Results F - 55 - M:  BAR-ILAN Results F - 55 - M 364 FWs(F - 89 - V M) Characters with speech length> 200:  364 FWs(F - 89 - V M) Characters with speech length> 200 Stop Words-Appraisal-Systemic:  Stop Words-Appraisal-Systemic Machine/OS/Tools :  Machine/OS/Tools Altaic – Linux OS – Altaic 4GB RAM – Importing, Generating ARFF using ATMan My PC – Windows XP - 1GB RAM - Running Experiments in Weka-3-4 HLL – Java1.4.2 File Zilla – Transferring Files from remotely Putty – To Run commands Remotely in Server TextPad – Tool for Text Processing Edit Plus – IDE for Generating Scripts and Programs Future Work :  Future Work Experiments with Individual Category of Play Type, Social Class Accuracy for Social Class Features, Combination of Features Get subtle features to distinguish Gender Character Get subtle features to distinguish Social Class Combination of Features for Gender/Social class Classification Combination of Features allows to predict characteristics on Appraisal or Systemic behavior Reference:  Reference Authorship Verification as a One-Class Classification Problem, Moshe Koppel, Jonathan Schler Automatic Authorship Attribution – E.Stamatatos, N. Fakotakis, G. Kokkinakis Gender Preferential Text Mining of E-mail Discourse – Malcolm Corney, Olivier de Vel, Alison Anderson, George Mohay Mining E-mail Authorship – Oliver de Vel Style Mining of Electronic Messages for Multiple Authorship Discrimination: First Results - S3, Shlomo Argamon, Marin Automatically Categorizing Written Texts by Author Gender - Moshe Koppel, Shlomo Argamon, Anat Rachel Shimoni Gender, Genre and Writing Style in Formal Written Texts - Moshe Koppel, Shlomo Argamon, Anat Rachel Shimoni, Jonathan Fine References:  References MEASURING THE USEFULNESS OF FUNCTION WORDS FOR AUTHORSHIP ATTRIBUTION – Shlomo Argamon, Shlomo Levitan A short introduction to Boosting : Yoav Freund, Robert E. Schapire A competitive Analysis of Automated Authorship Attribution Techniques – Jason Sorenson Text Categorization with Support Vector Machines: Learning with Many Relevant Features - Thorsten Joachims

Add a comment

Related presentations

Related pages

CS 495/595 - App Development for Smart Devices - Fall 2013

CS 495/595 - App Development for Smart Devices Fall 2013: Monday 7:10pm-9:50pm, Dragas 1117
Read more

CS 595 FALL 2003 home page - UCSB Computer Science Department

CS 595 - Web Services ... 2:00PM, Friday, CS conference room (Eng. I, room 2114) Presentations . Date: Tuesday, October 7, 2:00PM Speaker: Tevfik Bultan
Read more

PowerPoint Presentation - Old Dominion University

Your Apps Are Watching You CS 595 - Elliott Peay Overview Article Focus What Happened Findings What is Going On Article Focus Wall Street Journal ...
Read more

CS 595 -- Hot Topics in Distributed Systems: Data ...

CS 595. Reading Writeup. All the papers that you will read will involve a writeup. ... Presentation (Poor, Below Average, Average, Good, Excellent)
Read more

CS 595 -- Hot Topics in Distributed Systems: Data ...

CS595: Hot Topics in Distributed Systems: Data-Intensive Computing. Quarter: Fall 2010 Lecture Time: Monday/Wednesday, 1:50PM - 3:15PM Lecture Location ...
Read more

CS 595 - Special Topics on Cloud Computing

CS 595 - Special Topics on Cloud Computing (Spring 2015): Schedule. Home; ... 8:00am - 10:00am Final Project Presentations, Final project paper due ...
Read more

CS 595: Internet Network Programming (Spring 2005)

CS 595: Internet Network Programming (Spring 2005) ... Project Presentations (Part 1)--Apr 14 Quiz # 4: Web Services and EJB HW # 5 due ...
Read more

CS 595 Spring 2004 home page - UCSB Computer Science

CS 595 - Topics in Automated Verification - Spring 2004 ... Presentations . Time: Friday, June 4th at 1:00pm.
Read more

PowerPoint Presentation

Title: PowerPoint Presentation Author: Martha Fitzgerald Last modified by: Amanda Massa Created Date: 11/4/2004 5:47:41 PM Document presentation format
Read more

Department of Computer Science Field Work Project/Software ...

Department of Computer Science . Field Work Project/Software Engineering Project CS 585/CS 595/IT 585 . September 2012. In order to begin a either a ...
Read more