advertisement

Assembly Language + Prolog for Huge Data

50 %
50 %
advertisement
Information about Assembly Language + Prolog for Huge Data
Science-Technology

Published on February 6, 2009

Author: gstathis

Source: authorstream.com

advertisement

Assembly Language Extensionsof Visual Prolog for Data Mining ofHuge Databy George A. Stathis (ENB Ltd)Presentation for a lecture inVisual Prolog Applications Programming Conference ALC 2006(Faros, Portugal, 23-24th of April 2006) : Assembly Language Extensionsof Visual Prolog for Data Mining ofHuge Databy George A. Stathis (ENB Ltd)Presentation for a lecture inVisual Prolog Applications Programming Conference ALC 2006(Faros, Portugal, 23-24th of April 2006) Assembly Language Extensionsof Visual Prolog for Data Mining ofHuge DataGeorge A. Stathis (ENB Ltd) : Assembly Language Extensionsof Visual Prolog for Data Mining ofHuge DataGeorge A. Stathis (ENB Ltd) Pure Assembly Language is the fastest way to implement Visual Prolog predicates that can achieve intelligent processing of huge data. High-level logic programming power is then combined with extremely low-level high-speed enhancements. A number of special techniques emerge, combining the best of both worlds: Visual Prolog and Assembler. question:- What is the(completely new) biggest problemof the 21st Century? : question:- What is the(completely new) biggest problemof the 21st Century? question:- What is the(completely new) biggest problemof the 21st Century? : question:- What is the(completely new) biggest problemof the 21st Century? INFORMATION OVERLOAD ! - Therefore: : INFORMATION OVERLOAD - Therefore: - Therefore: : INFORMATION OVERLOAD - Therefore: Visual Prolog+Assembly Language: : Visual Prolog+Assembly Language: Easy Linking to other Languages, including ASM An industrial strength compiler with a Visual IDE Features suitable for Data Mining and Applied A.I. Totally Object-oriented today (after version 6) ASSEMBLY LANGUAGE: VISUAL PROLOG: Highest possible speed Largest possible data access and data space Smallest possible executable program size Slide 8: Before we even start “intelligent processing” of large texts, our basic desire is –ideally- to copy them inside main memory. Unfortunately, when certain list-size and text-size limits are exceeded, any Prolog predicate using recursion is useless for most simple tasks (such as converting a text to a list). This situation can -of course- be improved: It is possible to write optimized Visual Prolog predicates that minimize stack usage, so that texts of somewhat larger sizes (than before), can be converted, at somewhat improved speed. Visual Prolog includes some nice built-in predicates accessing memory directly. (Code examples of these nice extra predicates are in Visual Prolog’s documentation for advanced users). However even such optimized code, impossible in most other Prologs (lacking the nice predicates) except Visual Prolog, hits a new wall: - New memory limits, this time unsurpassable. At this stage, we can also try to soothe our emerging headache by… buying some Aspirin (as well as some extra RAM chips)! Isn’t a (good) Prolog compiler “good enough”? i.e. - Do we really need Assembly Language? Slide 9: Isn’t a (good) Prolog compiler “good enough”? i.e. - Do we really need Assembly Language? As an example of cases where ordinary Prolog code fails to function, and code implemented in Assembly works quite well, assisting Visual Prolog, is the optimized ASM predicate ‘strx2list’, which converts strings (large texts typically several megabytes in size) to Visual Prolog string lists. Here is a table of measured execution times, for ASM predicate “strx2list”, in all possible variants and memory methods: These timings show impressive speed: It is possible to convert a 100Mb text (of 10 million lines) to a list (of 10 million elements) in much less than 1 second! Slide 10: Isn’t a (good) Prolog compiler “good enough”? i.e. - Do we really need Assembly Language? The optimized ASM predicate ‘strx2list’ converts strings (large texts typically several megabytes in size) to Visual Prolog string lists. Experienced users of Visual Prolog may have noticed that columns A, B, C of this timing-table carry annotations of the memory allocation method used. The three possible ways of allocating memory are Visual Prolog’s stack, Visual Prolog’s heap, and Pre-allocation (i.e. memory must be already allocated, before the call). All possible memory allocation methods are implemented in “strx2list” ! These timings show impressive speed: It is possible to convert a 100Mb text (of 10 million lines) to a list (of 10 million elements) in much less than 1 second! implementing Visual Prolog predicates in Assembly Language at ENB Ltd: : implementing Visual Prolog predicates in Assembly Language at ENB Ltd: Difficult coding and debugging, difficult to learn well Difficult to persuade (99.99% of) I.T. companies to do it World shortage of good Assembly Language Programmers (and also a... shortage of jobs, if you are one!) Advantages: Disadvantages: Impressive time and memory savings for Data Mining Fastest possible file conversions and data integration tasks Seamless linking (“as if written in Visual Prolog”) Bit-encoded inference engine in ASM, supplementing Prolog All Visual Prolog features can be (re-)implemented 100% in ASM (including non-deterministic predicates) a Competitive Advantage (since… nobody else does it! ;-) ) Overview of Assembly Language work (enhancing Visual Prolog) at ENB Ltd : Overview of Assembly Language work (enhancing Visual Prolog) at ENB Ltd G.I.S. Prolog: a customized Prolog interpreter (a descendant of P.I.E.) Fuzzy String matching: enhancements of Levenstein’s algorithm (etc.) Date Binaries™: bit-encoded handling of dates and large time series The “Iphigenia Algorithm”™: a bit-encoded KBS inference engine The “Tourist Sort Algorithm”™: (the fastest Sort in the world, slightly faster than “Postman’s Sort”) Raw data handling (“data crunching”) libraries Applied R&D in algorithms and logic inference engines CSV (Excel-) file handling library of predicates Virtual strings and streams library, using Visual Prolog’s Heap Large library (> 200 predicates) for Data Conversions and input / output Non-deterministic search and non-deterministic list-handling predicates Library of specialised Writing and Random Object Generation predicates Library of specialised GUI and disk-access dialog-window predicates Large library (> 300 predicates) for handling huge texts and huge lists (where “huge” means several megabytes, up to physical RAM limits) The G.I.S. Prolog interpreter : The G.I.S. Prolog interpreter G.I.S. Prolog is an enhanced descendant of the “Pie Interpreter” (example code in Visual Prolog packages). Originally created for rapid testing of software ideas for G.I.S. programming (“point-in-polygon” algorithms, etc). Allows easy (interpreter-based) testing of ideas in Prolog, as well as Assembly, since Assembly language predicates are often compiled as G.I.S. Prolog built-in predicates. Enhanced with extra features (not present in the original PIE package), such as a DCG grammar parser, a “graph design window” for testing network path algorithms and new ideas for G.I.S (Geographic information systems), etc. Available for free download (check out the site of ENB Ltd, http://www.enb.gr, as well as http://www.omadeon.com) in order to encourage the use of Prolog and Visual Prolog by other companies / programmers, for possible cooperation. The G.I.S. Prolog interpreter (screen shots) : The G.I.S. Prolog interpreter (screen shots) This screen shot from G.I.S. Prolog shows the DCG grammar parser in operation, converting a grammar file into Prolog source code: The G.I.S. Prolog interpreter (screen shots) : The G.I.S. Prolog interpreter (screen shots) This screen shot (from G.I.S. Prolog) shows the “Graph Designer Tool”, a special user-window for designing graphs: The G.I.S. Prolog interpreter (screen shots) : The G.I.S. Prolog interpreter (screen shots) This screen shot (from G.I.S. Prolog’s Help) explains the use of the “Graph Designer Tool” to design graphs, etc. The G.I.S. Prolog interpreter (screen shots) : The G.I.S. Prolog interpreter (screen shots) R & Dat ENB Ltd:Seeking better and faster algorithmsbased on new theoretical resources : R & Dat ENB Ltd:Seeking better and faster algorithmsbased on new theoretical resources TOURIST SORT ALGORITHM IPHIGENIA Tourist Sort – the new “fastest sorting algorithm in the world’: : Tourist Sort – the new “fastest sorting algorithm in the world’: Unlike other sorting algorithms, Tourist Sort (a descendant of Radix Sort) does not involve comparisons between items sorted. Instead, Tourist Sort shuffles the items into small bins and then applies itself to each bin in turn, whenever required, until the array is sorted. This new algorithm is of linear complexity, like Radix Sort and Robert Ramey's Postman Sort (which are similar to it). However, there are also important differences between Tourist Sort, Radix Sort and Postman Sort. It has been said that "the magic of Radix Sort lies in finding the key to shuffle the items". Tourist sort is like Radix Sort, since it also uses keys and bins, but it does not need any magic, since it calculates in advance the size of the bins where the items are to be placed, as well as the final address of each item, before placing it. It is then re-applied iteratively, creating sub-bins, until the array is sorted. The name "Tourist sort" was given to this algorithm because its operation is more akin to “Hotel Reservation bookings” for groups of tourists, rather than "a postman delivering mail to boxes”. Also, because it was born in Greece, where tourism is the main national industry. Tourist Sort – the new “fastest sorting algorithm in the world’: : Tourist Sort – the new “fastest sorting algorithm in the world’: Screen shot from recent test-comparison with Postman Sort: Tourist Sort – the new “fastest sorting algorithm in the world’: : Tourist Sort – the new “fastest sorting algorithm in the world’: IMPLEMENTATION HISTORY The first implementation of Tourist Sort (in 1997) was done in Visual Prolog 5, and was quickly re-implemented in 16-bit Assembly. The early Visual Prolog version was published in internet Newsgroups, in February 1998. The latest 32-bit ASM implementation (April 2006) is more flexible as regards sorting records of arbitrary width. However, some fine tuning improvements are still possible: The latest ASM implementation also uses some Visual Prolog code, doing massive numbers of "assert" and "retract“ Prolog calls. Fortunately, Visual Prolog is one of the fastest Prolog compilers, so a few hundred thousand calls of "assert" and "retract" (for sorting files larger than 100Mb) cause only a very small delay, so that the speed of Tourist Sort remains at least as fast as Postman Sort’s. In recent close comparisons with Robert Ramey's public domain demos, Tourist Sort was found to be either slightly faster (in most cases) or else (in a minority of cases) just as fast as Ramey's. However, when the “assert”/”retract” calls are replaced by faster ASM counterparts, Tourist Sort will inevitably become much faster than Postman Sort. CSV-file handling applications : CSV-file handling applications The most important use of “Tourist Sort” (at ENB Ltd) was sorting large “CSV” (Excel) files, much larger than MS Excel itself allows. To handle CSV-file date-fields, “Date-binaries” were created (a data structure allowing set-theoretic and logic operations on bit-encoded days), together with a family of date-parsing and date-handling predicates. To implement simple data-mining tasks, pairs of CSV-file columns were combined together, while applying some constraints (e.g. weighed fuzzy string-matching). The combined use of Assembly language and Visual Prolog in all these tasks, was crucial: - Visual Prolog’s “industrial strength” back-tracking, combined (in “fail-driven loops”) with non-deterministic fuzzy-matching ASM predicates, produced very fast and robust results. (Screen-shots from some example applications follow) CSV file applications - ENB Ltd: : CSV file applications - ENB Ltd: Here are some screen-shots from a “CSV-file browser/ analyser”: CSV file applications - ENB Ltd: : CSV file applications - ENB Ltd: Here are some screen-shots from a “CSV-file browser/ analyser”: CSV files and Intelligent Data Integration: : CSV files and Intelligent Data Integration: The “Field Name Alias Extractor” uses Fuzzy String-matching and word-extraction predicates, to identify automatically that the man-given CSV column / field name “Precip_PIPERIA_PKPFPE” is a valid alias / instance of the field “PRECIPITATION”, in a nationwide hydrological database. This database was not given, but created by data integration, lumping together thousands of files, each one usually with a different field-name structure: Data mining in bits (1):“Date-Binaries” : Data mining in bits (1):“Date-Binaries” Date-Binaries are sets of dates encoded as bits. A Date-Binary is a Visual Prolog binary with the following structure and features: A 4-byte header, which contains: (a) the number of Years inside the Date-Binary, encoded in the first 2 bytes, and (b) the Starting Year, encoded in the next two bytes. These parameters (a), (b) are 16-bit numbers ranging from 0 to 65535: more than adequate. A number of “year-records”, each one with 12 “month-records”. These records are 32 bit-objects, containing days encoded as bits. Date-Binaries must contain complete years. Each year has a size of 48 bytes, or 12 double words, each one being a month, where each 31 bits represent the days of the month and 1 bit is available as a “flag-bit” for use by specialized operations. Date-Binaries were created to implement a number of statistical, logical and data-mining operations on time-series data, in the fastest and most compact way possible. Data mining in bits (1):“Date-Binaries” : Data mining in bits (1):“Date-Binaries” A Visual Prolog application was used to extract "good information" from a large number of comma-delimited text-files, containing nationwide hydrological measurements of several decades. It was required to spot the longest possible (more-or-less) continuous time-periods of valid measurements. (The continuity of values was important, for the hydrological software where this data was going to be used). To identify and remove erroneous data, as well as to find the best possible valid data (for further processing) a 5-phase extraction process was devised: First, the dates of all time-series were turned into bits / date-binaries and invalid raw data was excluded. Then, the date-binaries were AND-ed together, finding common days (i.e. common bits) Then, all valid common days inside each month (i.e. bits in each double word) were counted and compared to a threshold of 25 days per month, producing common valid month – binaries, where each month was encoded as either 1 (valid) or 0., Then a special Assembly language predicate was used to find the sizes of all contiguous time-periods, or consecutive 1’s, inside these common month binaries Finally, a (Visual Prolog) fail-driven loop was used to scan through the resulting sizes (of contiguous time periods) finding their maximums for all possible pairs of time-series data. These maximums were then plotted in CSV-file format and transferred to the engineers Date-Binaries (ENB software screen shots): : Date-Binaries (ENB software screen shots): Here is how the application 's dialog-window  looked like, before any action: Date-Binaries (ENB software screen shots): : Date-Binaries (ENB software screen shots): After selecting some files, adjusting program settings, and waiting a short time for the files to be pre-processed the dialog-window became: Date-Binaries (ENB software screen shots): : Date-Binaries (ENB software screen shots): After a few seconds, the results were quickly written out to an output file, readable by MS Excel. Pressing the button "VIEW" invoked Excel as an external application (via DDE), so that a summary of the most important "hot findings" could be viewed in MS Excel: R & Dat ENB Ltd:Seeking better and faster algorithmsbased on new theoretical resources : R & Dat ENB Ltd:Seeking better and faster algorithmsbased on new theoretical resources TOURIST SORT ALGORITHM IPHIGENIA An Expert System inference enginewith a bit-encoded Knowledge Baseand reduction rules operating on bits : An Expert System inference enginewith a bit-encoded Knowledge Baseand reduction rules operating on bits ALGORITHM IPHIGENIA In recent years, SAT methods (Boolean Satisfiability) and Zero Order Logic (simple propositional calculus) have helped to solve important optimisation problems, by translating such problems into Boolean Satisfiability. : In recent years, SAT methods (Boolean Satisfiability) and Zero Order Logic (simple propositional calculus) have helped to solve important optimisation problems, by translating such problems into Boolean Satisfiability. ALGORITHM IPHIGENIA ALGORITHM IPHIGENIA : ALGORITHM IPHIGENIA The “Iphigenia Algorithm” is not intended to replace the powerful backtracking and unification mechanisms of Visual Prolog 6, but to supplement them, proving logic theorems and answering logic queries in bit-encoded knowledge bases, such as: ( ( A & B & …) ? G or … ) &( ( D & E & …) ? Z or …)________________________ ? ? & …. ? Z or … Further information about “Iphigenia” is given in this presentation’s handouts, as well as in the sites: http://multiforms.netfirms.com http://www.omadeon.com/alc

Add a comment

Related presentations

Related pages

Assemblersprache – Wikipedia

... -Zeichen, das INT 21 h (s.u) als Ende der Zeichenkette erkennt DATA ENDS;-Ende des Datensegments ... Assembly Language Step-by-Step. Programming with ...
Read more

Why Use Prolog?

Why Use Prolog? Here are some notes ... In Prolog, you can treat data as programs. ... The fact that Prolog is a logic programming language only comes in ...
Read more

VIP-ALC'06 Program - Visual Prolog

VIP-ALC'06 Program. ... Assembly Language Extensions of Visual Prolog for data-mining applications with huge data.
Read more

Prolog and Epilog - msdn.microsoft.com

Intrinsics and Inline Assembly. ... Prolog and Epilog. ... The associated unwind data must describe the action of the prolog and must provide the ...
Read more

VIP-ALC 06 - Abstracts - Visual Prolog

The programming language Visual Prolog has gone through a ... mining Applications with Huge Data. Pure Assembly Language is the fastest way to ...
Read more

How to write assembly or C code generator in prolog ...

How to write assembly or C code generator in prolog ? ... and was one huge motivation to at least try to support partially instantiated data structures ...
Read more

Prolog Programming Language

Prolog is a rich collection of data structures in the language and human reasoning, and a powerful notation for ... Prolog is an ideal prototyping language.
Read more

Assembly Language Examples and Tutorials

Assembly Language Examples and Tutorials: Submit Article: Home » Articles » Assembly Language: ... Word Assembly Program. Assembly Language: Oct 17 ...
Read more

Levels of Programming Languages Gerald Penn CSC 324

Levels of Programming Language ... –Assembly language is a symbolic presentation of machine ... language for Java, ML or Prolog compilers.
Read more

x86 Disassembly/Calling Conventions - Wikibooks, open ...

x86 Disassembly/Calling Conventions. ... specifies how a function call in C or C++ is converted into assembly language. ... (the function prologue)
Read more