50 %
50 %
Information about barrie

Published on January 10, 2009

Author: aSGuest9990


Slide 1: Vetting Academic Work For Originality John M. Barrie Slide 2: Agenda My Background History Problem— Academia Core Technology Turnitin July 23rd Release Legal Overview Slide 3: My Background Undergraduate: U.C. Berkeley, Rhetoric and Neurobiology Doctorate: U.C. Berkeley — Biophysics Multidisciplinary Graduate Group, Neurobiology Dissertation: Theoretical and computational electro-neurophysiology - Spatiotemporal dynamics of the neocortical EEG (aka, the physiology of perception) Slide 4: History 1994 — Created technology to aid collaborative learning among hundreds of U.C. Berkeley students Exposed undergraduates to the peer review process Allowed students to share information in ways not possible without the Internet Published the results in Science magazine Observations from the study: Facilitated acquisition of core course ideas Faculty interacted with technology <2hr/term Plagiarism and cheating were rampant at Berkeley Prediction: IP theft would become an enormous problem for academia… and EVERYONE ELSE. Slide 5: Transformation Complete November 27, 1999 AP story regarding Turnitin — International Wire Michelle Locke, ended her first sentence with "…John Barrie could be your worst nightmare." Slide 6: Problem Internet has allowed the public to access massive amounts of information — that access has been accompanied by a manifold increase in the theft and misappropriation of digital intellectual property This problem has become especially pronounced in academia where it takes the form of plagiarism. Students are using the Internet like an 8-billion page searchable, cut-and-paste(able) encyclopedia The problem jeopardizes the integrity of the entire system and the addressable academic market is in the billions Slide 7: Problem From a September 30, 2003 Gallup Poll: Slide 8: Problem According to one of the largest studies of plagiarism in the world (August 2003), Donald McCabe found that almost 40% of students surveyed admitted to plagiarizing information from the Internet - and that percentage may be low because many students did not consider ‘borrowing’ from the Internet without attribution plagiarism According to the largest student of plagiarism in Australia (CAVAL, 2002), a significant number of students were identified as plagiarizing from the Internet Turnitin currently receives over 40,000 student papers per day and from 15-30% of those manuscripts are unoriginal Slide 9: Technology is the Key “Warning students [or journalists or authors or researchers or anyone else] not to plagiarize, even in the strongest terms, appears not to have had any effect whatsoever. Revealing the use of plagiarism-detection software [Turnitin] to the students prior to completion of an assignment, on the other hand, proved to be a remarkably strong deterrent.” from: Actions Do Speak Louder than Words: Deterring Plagiarism with the Use of Plagiarism-Detection Software, by Bear F. Braumoeller, Harvard University and Brian J. Gaines, University of Illinois at Urbana-Champaign This is a digital problem, and it must be addressed with a digital solution - the status quo is not working… big time. Slide 10: Technology Internet Crawler: Capable of copying the Internet at a rate of 65+ million pages/day Internet Copy: More than 9 billion pages crawled and archived — high-valued content in the aggregate Unique Indexing: We search the entire document Rapid Pattern Recognition: Compares whole documents to billions of other documents in seconds User Interface: Internet access to our content and technology Slide 11: Core Technology Slide 12: Acquiring the Internet: High Performance Web Crawling Premise: Average web page has 10 links to other pages Can create a spanning tree of the Internet given a seed page Theory: Download a given page Index the text Find URLs and start from the beginning Slide 13: Process of Crawling Internet Seed Page Slide 14: Internet Seed Page Crawled Pages Process of Crawling Slide 15: Internet Seed Page Crawled Pages Process of Crawling Slide 16: Internet Seed Page Crawled Pages Process of Crawling Slide 17: Difficulties in Crawling Naïve approach is not adequate for high performance system Common crawler pitfalls: Downloading duplicate/mirrored pages (cycling) Repeating work is a waste of resources Lost in “crawler traps” Netiquette Time between requests (30 seconds) Limit time on a domain (4 hours) Robots.txt files SessionID stripping Not following malicious and/or junk links Junk Data: pornography, pages too small, etc… Slide 18: Crawler Performance What is required of a high performace crawler? Efficiency (no duplicates/unusable data) Propriety (obey Netiquette) Speed!!! TurnItInBot design Crawler system downloads pages at 750+ pages per second Manages 7500+ urls a second Has crawled over 9 billion web pages 5 billion in active internet “node” Competing with Google 4 billion in archived Internet “node” Currently in Refresh mode More ways to maximize resources Slide 19: What Happens to the Data? Crawling at 75 Mbps efficiently is a hard task… But indexing it into usable data is even harder… Need to have meta-data to aid us Text search against 9 billion docs won’t work… Complex mathematical heuristics to aid us How to store meta-data? Need to have fast query speed as well as throughput How fast is fast enough? Need to be able to handle ~ 10,000 queries per second None of the major DB vendors (Oracle, MS, Sybase…) come close to meeting our needs… Slide 20: Query Performance Solution: create our own custom DB tailored to our needs Draws upon every performance enhancing concept in DB world Performance enhancements based on knowing our specific usage patterns and application flow Results: Currently processing about 40,000 papers a day System tested for speeds up to 125,000 papers a day Distributed system scales linearly Performance increases as a function of number of CPU’s Slide 21: Copy of Internet Books, Journals, Newspapers (LexisNexis, Gale, Proquest, Factiva) Student Papers or Client Node Extract matching documents Manuscript or article submitted to iParadigms Computer transforms manuscript into a digital fingerprint (next slide) Finding a Needle in the Haystack: Searching the Entire Document Slide 22: Finding A Needle in a Haystack We re-map the digital fingerprint of the manuscript or article into a high dimensional space and test for clustering Slide 23: Matching passages from 5+ billion Internet web pages: updated at a rate of million pages/day Matching passages from millions of Student Papers or Client Node Compare matching passages to original manuscript or article Matching passages from millions of Books, Journals, Newspapers Create Originality Report Entire process < 10 seconds Originality Report Slide 24: Detection of Word Substitution or Alteration MACBETH MANUSCRIPT FROM THE INTERNET (INTRO PARAGRAPH) Macbeth is presented as a mature man of definitely established character, successful in certain fields of activity and enjoying an enviable reputation. We must not conclude, there, that all his volitions and actions are predictable; Macbeth's character, like any other man's at a given moment, is what is being made out of potentialities plus environment, and no one, not even Macbeth himself, can know all his inordinate self-love whose actions are discovered to be-and no doubt have been for a long time-determined mainly by an inordinate desire for some temporal or mutable good. SAME MANUSCRIPT WITH MODIFIED WORDS Macbeth is shown as an empowered man of well-established character, prosperous in several fields of life and enjoying an esteemed reputation. We mustn't conclude, therefore, that all of his volitions and actions will be foreseeable ; Macbeth's essence , like most other men at any given time, is what's being created out of potentialities and his environment, and no one, not even Macbeth himself, can discern all his immoderate self-love whose behaviors are found to be-and without doubt have been for some time-determined primarily by an extreme desire for a temporal or changeable good. Slide 25: Detection of Sentence or Paragraph Addition PAPER A MACBETH INTERNET DERIVED PAPER (INTRO PARAGRAPH) Macbeth is presented as a mature man of definitely established character, successful in certain fields of activity and enjoying an enviable reputation. We must not conclude, there, that all his volitions and actions are predictable; Macbeth's character, like any other man's at a given moment, is what is being made out of potentialities plus environment, and no one, not even Macbeth himself, can know all his inordinate self-love whose actions are discovered to be-and no doubt have been for a long time-determined mainly by an inordinate desire for some temporal or mutable good. PAPER A + B MACBETH MODIFIED TEST PAPER WITH COMBINED ADDED CONTENT Shakespeare's famous play, Macbeth, is one of his great tragedies based around the classic theme of the hero's fatal flaw. Macbeth is presented as a mature man of definitely established character, successful in certain fields of activity and enjoying an enviable reputation. Yet, like any man, he is human, and thus in possession of flaw and foibles, hidden that they may be from public eye, and hinted at by foreshadow only by the author. We must not conclude, there, that all his volitions and actions are predictable; Macbeth's character, like any other man's at a given moment, is what is being made out of potentialities plus environment, and no one, not even Macbeth himself, can know all his inordinate self-love whose actions are discovered to be-and no doubt have been for a long time- determined mainly by an inordinate desire for some temporal or mutable good. This desire being so strong under certain circumstances as to override all others, even, as is usually the case in tragedy, the ultimate desire of self-preservation. Slide 26: Report Generation Problem: Out of the multiple different documents that match, compare all possible permutations then select maximal overlapping set in a time-efficient way Want to avoid false positives yet not miss any viable matches Much fine-tuning involved in finding right criteria Ability to compare against other databases of content E.g., Proquest, Lexis, Gale, etc. Database of submitted student work Any other content databases given to us The entire process takes seconds per document. Slide 27: Why Not Search Google? Slide 28: Why Not Search Google? Because the process would take hours… if it even worked! Slide 29: Creating the Beachhead by Building on the Core Slide 32: Multiple Languages Slide 34: Turnitin is Becoming Part of How Education Works — Usage is Doubling Every Year We should be receiving over 100,000 student papers/day by 2006! Slide 35: It Works! Slide 36: At Least Two New Public Cases of Plagiarism Every Month in 2004 "Anyone who says this [plagiarism] will not happen in their newsroom is not seeing it clearly," said Jack Fuller, president of Tribune Publishing, Chicago. (Editor and Publisher Magazine, April 22, 2004) Economic Research Firm Sues HVB. Alleging Plagiarism April 24, 2004 NEW YORK -- In a case that suggests plagiarism may be infecting the financial- soothsaying industry, a prominent New York economic research firm is suing a German banking giant for allegedly stealing two dozen reports, Friday's Wall Street Journal reported. High Frequency Economics Ltd. alleges in a lawsuit filed yesterday in U.S. District Court in New York that a Singapore-based economist at Bayerische Hypo- & Vereinsbank AG, or HVB, copied "substantial portions" from at least 22 of its research reports from October 2002 to May 2003 and posted them on the HVB Web site as its own. Alberta's Klein Accused of Plagiarism in Essay, Journal Says May 13, 2004 ( -- Alberta Premier Ralph Klein has been accused of plagiarism while writing a paper for his university degree in communications, the Edmonton Journal reported. Minister resigns because of plagiarism May 14, 2004 Associated Press - The senior minister at the United Church of Christ in Keene has resigned after admitting he lifted parts of sermons from the Internet. Slide 37: Credibility is Hurting the Media Every Day Sulzberger: “Assume You Have a Blair or Kelley in Your Newsroom” Newspaper editors should assume that they have a Jayson Blair or Jack Kelley-type scofflaw in their newsrooms, New York Times Publisher Arthur O. Sulzberger Jr. told attendees Thursday at the American Society of Newspaper Editors (ASNE) conference. Working under that assumption, they should proceed with efforts to strengthen controls on plagiarism and ethical lapses."Go back to your newspaper with the assumption that someone in your newsroom is doing these things," Sulzberger said during a panel discussion on ethics that included several leading editors and newspaper executives. "Run the drill, get ahead of the curve. Why wait for the kind of explosion that rocked The New York Times and USA Today? Until your newspaper grapples with it individually, [improvements are] probably not going to stick.“ Sulzberger also revealed that the worst thing to come out of the Blair scandal was not the former reporter's ethical crimes, but the fact that sources and readers who knew about the incorrect reporting did not complain because they believed that that was what newspapers did. "That is scary," he said. Editor and Publisher Magazine, April 22, 2004 The New York Times has publicly indicated that they will use iThenticate in the near future. Slide 38: Plagiarism is Hurting STM — Then “If you ran this system [] on every article [in the medical literature] that comes out, you would find this happening all over the place.” Nature, November 1999 - referring to a beta test of our technology during R&D Slide 39: Plagiarism is Hurting STM — and Now “Studies in certain fields have estimated that anything up to 20% of published papers contain some degree of self-plagiarism…. Redundant publications must be recognized as a real threat to the quality and intellectual impact of … publishing.” “Duplicates [i.e., papers] have also been shown to cause meta-analyses to over-estimate the efficacy of drugs.” Nature, May 19, 2005 Slide 40: Legal — Copyright The purpose of Copyright Law is “to promote the Progress of Science and useful Arts….” U.S. Constitution, Art. I § 8, cl. 8. “… the aim of copyright is to give an author an exclusive right sufficient to create an incentive to produce, but not so great a right as to undermine the public domain.” These rights must be for a limited term and they must “promote the progress of science.” Lawrence Lessig, The Future of Ideas (New York: Random House, 2001), 98. Slide 41: Fair Use “the fair use of a copyrighted work … for purposes such as criticism, comment, news reporting, teaching, scholarship, or research, is not an infringement of copyright.” 17 U.S.C.A. § 107. Slide 42: Legal Opinion “… the Company’s activity may be fairly characterized as ‘criticism’ as that term is used in the preamble of Section 107 of the Copyright Act. According to Webster’s Dictionary (2d Ed., 1996), criticism means ‘fault finding or censure’ or ‘the act of judging the merits of something.’ By this definition, the Company, by investigating the integrity of an author’s work, is engaged in a form of criticism: the Company is judging the merits of the author’s work. As a result, we conclude that the Company’s ‘criticism’ of written works constitutes the type of activity that the courts have traditionally characterized, and the legislature has recognized, as ‘fair use’ of copyrighted material. Our opinion is further supported by a closer look at whether the Company has sufficiently transformed the original author’s work. Certainly, the Company’s use involves a complete transformation of the raw material when the fingerprint is created. Further, the purpose of the Company’s fingerprint creation and analysis is to identify potential plagiarism, which has absolutely nothing to do with the purpose of the original work.” — Foley & Lardner Slide 43: Legal Opinion “…the Company’s [originality] report, by identifying potential plagiarists, provides new insights and understandings about the original. We believe that the identification of plagiarists is the type of activity that the fair use doctrine is intended to protect for the ‘enrichment of society.’” — Foley & Lardner Slide 44: Legal Opinion “In short, for the same reason delineated above with regard to our plagiarism analysis, we are of the opinion that storing a copy of an author’s work in a database to be used solely for the purpose of comparing the work to other works constitutes a ‘fair use’ of the work. In view of the foregoing, it is our opinion that under Copyright Law, both types of use by the company of written works, despite the lack of express consent of the author, fall within the ambit of the fair use doctrine and, accordingly, the Company should not be liable for the claims of copyright infringement … a court would find in favor of a defense of fair use.” — Foley & Lardner Slide 45: An Example Leslie A. Kelly was a photographer who posted his pictures on a website Arriba owned a search engine for images They copied ALL of Kelly’s pictures without permission They copied the ENTIRE picture They transformed the pictures into thumbnails They stored the thumbnails in a database They are a commercial venture and profited from their service They did not harm the market value of Kelly’s work The Court found that Arriba was making a fair use of Kelly’s pictures. Kelly v. Arriba Soft Corp., 9th Cir., No. 00-55521, 2/6/02 Slide 46: Thank You

Add a comment

Related presentations

Related pages

Barrie (Ontario) – Wikipedia

Barrie; Downtown Barrie von Kempenfelt Bay aus: Wappen: Flagge: Lage in Ontario
Read more

J. M. Barrie – Wikipedia

Literatur von und über J. M. Barrie im Katalog der Deutschen Nationalbibliothek; Werke von und über J. M. Barrie in der Deutschen Digitalen Bibliothek
Read more

Barrie - Official website

Barrie presents its Fall-Winter 2016 collection, deep in the spectacular landscapes of Scotland, ... is a site operated by Barrie Knitwear Limited ("We").
Read more

City of Barrie

Official city website, with information about its businesses and economy, tourism, municipal departments and community services.
Read more

Barrie - Wikipedia

Barrie is a city in Central Ontario, Canada, on the western shore of Lake Simcoe. Although located in Simcoe County, the city is politically independent.
Read more

Tourism Barrie

Welcome to Barrie, where great hiking trails, outdoor recreation and theatre are just the beginning. You and your family are going to love Barrie. A ...
Read more

J. M. Barrie - Wikipedia

Born: James Matthew Barrie (1860-05-09) 9 May 1860 Kirriemuir, Angus, Scotland, UK: Died: 19 June 1937 (1937-06-19) (aged 77) London, England, UK: Resting ...
Read more

Wetter Barrie |

Wie wird das Wetter heute in Barrie? Temperatur-, Wind- und Regenvorhersage, sowie aktuelle Wetterwarnungen finden Sie auf für Barrie, Ontario ...
Read more

Barrie -

Eine modische Reise von französischer Haute Couture über italienische High Class Casual Wear bis hin zu amerikanischem Ready-to-Wear-Chic.
Read more

Barrie, Ontario 7 Day Weather Forecast - The Weather Network

Find the most current and reliable 7 day weather forecasts, storm alerts, reports and information for Barrie, ON, CA with The Weather Network.
Read more