The need of Interoperability in Office and GIS formats

0 %
100 %
Information about The need of Interoperability in Office and GIS formats
Technology

Published on March 24, 2009

Author: markusN

Source: slideshare.net

Description

Free GIS and Interoperability: The need of Interoperability in Office and GIS formats

GIS Open Source, interoperabilità e cultura del dato nei SIAT della Pubblica Amministrazione

[GIS Open Source, interoperability and the 'culture of data' in the spatial data warehouses of the Public Administration]

Free GIS and Interoperability GIS Open Source, interoperabilità e cultura del dato nei SIAT della Pubblica Amministrazione [GIS Open Source, interoperability and the 'culture of data' in the spatial data warehouses of the Public Administration] GFOSS'04 ITC-irst, 16 Nov 2004 (last revised 10 2005) M. Neteler neteler at itc it http://mpa.itc.it ITC-irst, Povo (Trento), Italy

The need for Interoperability The problem nowadays data have to be exchanged across often very heterogeneous groups the personal choice of application software/operating system should not affect the data exchange data exchange standards are available limited awareness for the need of interoperability limited implementation of interoperability in processes and software commonly used file formats let to believe in interoperability: “false friends”

nowadays data have to be exchanged across often very heterogeneous groups

the personal choice of application software/operating system should not affect the data exchange

data exchange standards are available

limited awareness for the need of interoperability

limited implementation of interoperability in processes and software

commonly used file formats let to believe in interoperability: “false friends”

What are Standardization & Interoperability? Standardization versus Interoperability Standardization: Written/published document describing data formats, models etc. Example Office Standards: ASCII, HTML, XML, ... Example GIS Standards: GML, ISO 08211, ISO/IEC 15444-1, WMS etc. Only published standards are acceptable. Interoperability: More than application of standardization, it also comprises the interpretation of the standard (sometimes definitions are incomplete)

Interoperability? The two dimensions of Interoperability Longitudinal Interoperability: time - long term storage Data shall be readable over time (years, decades, ...). This is of particular interest for data of public administration and long-term projects. Transversal Interoperability: sharing data between users Data shall be readable across user communities, independent from software or operating system used (freedom of software choice). Again, this is of particular interest for data of public administration and long-term projects.

Part I: Office Interoperability

Example: MS-Word .DOC format Are WORD.doc files a suitable for data exchange? the format is undocumented, to some extend it was reverse-engineered -> does not support transversal interoperability the format is regularly changed (Word 1, 2, 95, 97, NT, 2000, XP, ... also named WinWORD 6, 8, 10,...) -> does not support longitudinal interoperability Prone to MS-Windows macro viruses severe security/privacy issues (example next slide) - DOC files contain sensitive information about user (unrelated to the contents) - deleted text may still be legible outside of MS-Word -> contents cannot be completely verified

the format is undocumented, to some extend it was reverse-engineered -> does not support transversal interoperability

the format is regularly changed (Word 1, 2, 95, 97, NT, 2000, XP, ... also named WinWORD 6, 8, 10,...) -> does not support longitudinal interoperability

Prone to MS-Windows macro viruses

severe security/privacy issues (example next slide) - DOC files contain sensitive information about user (unrelated to the contents) - deleted text may still be legible outside of MS-Word -> contents cannot be completely verified

Example: MS-Word .DOC format - security/privacy issues Descrambling a WORD.doc file Your unique MS-Windows user ID (or similar): PID_GUIDäAN{714738E3-FF4C-11D3-ZD7C-00E0281D67A7} This makes your (anonymous) document traceable . Sometimes delete text is still visible (think of re-using an existing WORD file) A famous example: In February 2003, the British government of Tony Blair published a dossier on Iraq's security and intelligence organizations . This dossier was cited by Colin Powell in his address to the United Nations the same month. Dr. Glen Rangwala, a lecturer in politics at Cambridge University, quickly discovered that much of the material in the dossier was actually plagiarized from a U.S. researcher on Iraq. http://www.computerbytesman.com/privacy/blair.htm What you may find:

Your unique MS-Windows user ID (or similar): PID_GUIDäAN{714738E3-FF4C-11D3-ZD7C-00E0281D67A7} This makes your (anonymous) document traceable .

Sometimes delete text is still visible (think of re-using an existing WORD file) A famous example: In February 2003, the British government of Tony Blair published a dossier on Iraq's security and intelligence organizations . This dossier was cited by Colin Powell in his address to the United Nations the same month. Dr. Glen Rangwala, a lecturer in politics at Cambridge University, quickly discovered that much of the material in the dossier was actually plagiarized from a U.S. researcher on Iraq. http://www.computerbytesman.com/privacy/blair.htm

Descrambling a WORD.doc file: The British Iraq dossier 2003 1/2 http://nytimes.com Example: MS-Word .DOC format - security/privacy issues

[neteler@dandre2 gfoss04]$ tr -d [:cntrl:] < blair.doc ÐÏࡱá>þÿz|þÿÿÿyÿ [...] -xxxxí-o#o#{'?^,k6®äí-* RûuËÂG (É-$IRAQ ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONThis report draws upon a number of sources, including intelligence material, and shows how the Iraqi regime is constructed to have, and to keep, WMD, and is now engaged in a campaign of obstruction of the United Nations Weapons Inspectors. [...] [`azbhh§h»h?h-i/isjÿÿ cic22 JC:DOCUME~1 phamill LOCALS~1TempAutoRecovery save of Iraq - security.asd cic22 JC:DOCUME~1 phamill LOCALS~1TempAutoRecovery save of Iraq - security.asd cic22 JC:DOCUME~1 phamill LOCALS~1TempAutoRecovery save of Iraq - security.asd JPratt C:TEMPIraq - security.doc JPratt A:Iraq - security.doc ablackshaw!C: ABlackshaw Iraq - security.docablackshaw#C: ABlackshaw A;Iraq - security.doc ablackshaw A:Iraq - security.doc MKhan C:TEMPIraq - security.doc MKhan (C:WINNTProfilesmkhanDesktopIraq.docþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ PjÿzXVÿ*uzLl_ÿbêzLl_ [...] jP@GTimes New Roman5SymbolG&ArialHelveticaA&Arial Narrow?&Arial Black&quot;qÐh_r&Òr&aõq#JV,?RVW,º!¥À??20døi?fÿÿCIraq- ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONdefaultMKhanþÿàòùOh«+'³Ù0? ìø 4DPlx?¬?äDIraq- ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONraqdefaultefaefaNormal.dotN MKhan .d4ha Microsoft Word 8.0 C@ÒIk@n)§ÈÂ@&quot;ZöfËÂ@døèuËÂ#JVþÿÕÍÕ [...] http://www.computerbytesman.com/privacy/blair.htm Weapons of mass destruction Descrambling a WORD.doc file: The British Iraq dossier 2003 2/2 Example: MS-Word .DOC format - security/privacy issues

Example: MS-Excel .XLS format Are EXCEL.xls files a suitable for data exchange? the format is undocumented, to some extend it was reverse-engineered -> does not support transversal interoperability the format is regularly changed (Excel 95, 97, NT, 2000, ...) -> does not support longitudinal interoperability Prone to MS-Windows viruses Limitation: max. 65535 lines in a table (2 16 ) Auto-conversion feature risky: Some fields/columns are automatically changed to date-time format (see example next slides) -> risk of accidental data damage high

the format is undocumented, to some extend it was reverse-engineered -> does not support transversal interoperability

the format is regularly changed (Excel 95, 97, NT, 2000, ...) -> does not support longitudinal interoperability

Prone to MS-Windows viruses

Limitation: max. 65535 lines in a table (2 16 )

Auto-conversion feature risky: Some fields/columns are automatically changed to date-time format (see example next slides) -> risk of accidental data damage high

Example: MS-Excel .XLS format – accidental data damage The “Human Genome Project” case 1/3 In 2004 scientists discovered that some gene names were being changed inadvertently to non-gene names. Citation: “ A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names ; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible ; the original gene names cannot be recovered. A default date conversion feature in Excel (Microsoft Corp., Redmond, WA) was altering gene names that it considered to look like dates . For example, the tumor suppressor DEC1 [Deleted in Esophageal Cancer 1] [3] was being converted to '1-DEC.' ” Cited after: B.R. Zeeberg, J. Riss, D.W. Kane, K.J. Bussey, E. Uchio, W.M. Linehan, J.C. Barrett and J.N. Weinstein, BMC Bioinformatics 2004, 5:80 http://dx.doi.org/10.1186/1471-2105-5-80

In 2004 scientists discovered that some gene names were being changed inadvertently to non-gene names. Citation: “ A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names ; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible ; the original gene names cannot be recovered. A default date conversion feature in Excel (Microsoft Corp., Redmond, WA) was altering gene names that it considered to look like dates . For example, the tumor suppressor DEC1 [Deleted in Esophageal Cancer 1] [3] was being converted to '1-DEC.' ”

The “Human Genome Project” case 2/3 Example: MS-Excel .XLS format – accidental data damage http://dx.doi.org/10.1186/1471-2105-5-80

The “Human Genome Project” case 3/3 Example: MS-Excel .XLS format – accidental data damage http://dx.doi.org/10.1186/1471-2105-5-80

Suggestions for “Office” data interoperability Text files: ASCII, HTML, RTF, XML, Latex Postscript/PDF for read-only documents Tables: CSV, xBase (dBase), XML Databases: SQL92-ASCII Bibliography: BibTex

Text files: ASCII, HTML, RTF, XML, Latex Postscript/PDF for read-only documents

Tables: CSV, xBase (dBase), XML

Databases: SQL92-ASCII

Bibliography: BibTex

Suggestions for “Office” data interoperability Automated conversion tools can be used to provide all formats Text files: ASCII, HTML, RTF, XML Postscript/PDF Tables: CSV, xBase (dBase), XML Databases: SQL92-ASCII Bibliography: BibTex Converters (examples): OpenOffice.org [1] wvWare [2[ OpenOffice.org, xbase2pg [3] ODBC, xbase2pg Bibutils [4] Bibtex2html [5], (Endnote) [1] http://OpenOffice.org itself uses XML as own standard format [2] http://wvware.sourceforge.net/ [3] http://www.klaban.torun.pl/prog/pg2xbase/ [4] http://www.scripps.edu/~cdputnam/software/bibutils/bibutils.html [5] http://www.lri.fr/~filliatr/bibtex2html/

Text files: ASCII, HTML, RTF, XML Postscript/PDF

Tables: CSV, xBase (dBase), XML

Databases: SQL92-ASCII

Bibliography: BibTex

OpenOffice.org [1]

wvWare [2[

OpenOffice.org, xbase2pg [3]

ODBC, xbase2pg

Bibutils [4]

Bibtex2html [5], (Endnote)

OASIS: “Office” data interoperability Promotion of Open Document Exchange Format Proposed and implemented new open standard format: OASIS OpenDocument XML format The OASIS OpenDocument format [1] is a vendor and implementation independent file format which guarantees freedom and independence E.g., OpenOffice.org uses OASIS as default format from version 2.0 onwards as well as KOffice , StarOffice software and other vendors The OASIS OpenDocument file format is one of the file formats recommended by the European Commision [2] [1] http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office [2] http://europa.eu.int/idabc/en/document/3439

Proposed and implemented new open standard format: OASIS OpenDocument XML format

The OASIS OpenDocument format [1] is a vendor and implementation independent file format which guarantees freedom and independence

E.g., OpenOffice.org uses OASIS as default format from version 2.0 onwards as well as KOffice , StarOffice software and other vendors

 

GIS Standards and Organizations GIS data sets are more than geometry: Metadata - geographic reference - colors, display attributes etc - history of data modifications 1990 1992 2004 1994 1997 http://www.opengeospatial.org

GIS Interoperability: GDAL and OGR libraries Data abstraction GDAL http://www.gdal.org Abstraction layer ENVI GeoTIFF SAR GRASS ECW HDF4 JPEG2000 MrSID ArcGRID Metadata - Number of bands - Color table - ... - Coordinate system - Projection 40 Frmts EPSG Codes PROJ.4

GIS Interoperability: GDAL and OGR libraries Data abstraction OGR http://www.gdal.org/ogr/ Metadata - Coordinate system - Projection Abstraction layer EPSG Codes ArcCover MITAB Oracle SHAPE PostGIS Geodatabase DGN 20 Frmts

GIS Data formats and support question GDAL Development: Raster formats Direct fundings: - Atlantis (ENVISAT, MFF, HKV Blobs) - eCognition Germany (FUJI BAS Format) - Los Alamos Nat. Labs (FITS) - OPeNDAP Inc. (OPeNDAP/DODS) - PeopleSoft ( ERDAS LAN ) - Safe Software (USGS SDTS, ISO8211 support) - Yukon Department of Environment (USGS DEM) Public formats/Open documents/Reverse engineered - ERDAS Imagine ( IMG ) - ERMAPPER ( ECW ) - ESRI formats ( ArcGrid ) - GDAL Virtual Format - JasPer ( JPEG2000 ); Kakadu (GeoJP2 interface for JPEG2000 = ISO/IEC 15444-1) - LizardTech ( MrSID , JPEG2000 ) - NOAA (AVHRR data)

GIS Data formats and support question OGR Development: Vector formats Direct fundings: - DM Solutions Group and GoMOOS ( SQLite RDBMS, Comma Sep. Values CSV ) - OPeNDAP Inc. (OPeNDAP/DODS) - Safe Software (FMEObjects) - SRC, LLC ( Oracle Spatial ) Public formats/Open documents/Reverse engineered - ESRI ( SHAPE , ArcCoverage ) - GML - IHO S-57 - MapInfo ( TAB and MIF/MID ) - Microsoft ( ODBC OGR) - Microstation ( DGN ) - MySQL (non-spatial data) OGR - OGDI Vectors (VMAP) - OGR Virtual Format - PostgreSQL/PostGIS - SDTS - UK Ordnance Survey (NTF) - U.S. Census (TIGER)

GIS formats Why so many formats? No big problem! Application specific requirements, which partially contradict each other high compression rate small runtime storage requirements coding without information loss fast decoding easy access to pixels simple algorithm Hardware-/CPU-independence “Good software” can handle numerous formats. Software patents and rights of third parties: future traps ?!

high compression rate

small runtime storage requirements

coding without information loss

fast decoding

easy access to pixels

simple algorithm

Hardware-/CPU-independence “Good software” can handle numerous formats.

Software patents and rights of third parties: future traps ?!

GIS formats and Software Patents How software patents affect GIS users LZW (Lempel Ziv Welch) Compression Used in many raster formats (e.g. GIF) Integrated into GRASS before it became patent, later replaced by Zlib Deflate Unisys started to charge for usage after waiting some years MrSID (Multi-resolution Seamless Image Database) wavelet based image file format three patents covering both the image compression and on the fly image decompression technology GDAL support MrSID but requires MrSID SDK license ECW (ERMAPPER Compressed Wavelets) Patent pending GPL released source code available (of patented code?) JPEG 2000 Situation not very clear

Used in many raster formats (e.g. GIF)

Integrated into GRASS before it became patent, later replaced by Zlib Deflate

Unisys started to charge for usage after waiting some years

wavelet based image file format

three patents covering both the image compression and on the fly image decompression technology

GDAL support MrSID but requires MrSID SDK license

Patent pending

GPL released source code available (of patented code?)

Situation not very clear

Summary The personal choice of application software/operating system should not affect the data exchange longitudinal and transversal interoperability must be granted Only documented formats may be used There is no excuse: start to use interoperable formats today GIS interoperability is at a better state than Office documents interoperability Interoperability awareness needs to be promoted : today and in future

The personal choice of application software/operating system should not affect the data exchange

longitudinal and transversal interoperability must be granted

Only documented formats may be used

There is no excuse: start to use interoperable formats today

GIS interoperability is at a better state than Office documents interoperability

Interoperability awareness needs to be promoted : today and in future

License of this document Document home: http://mpa.itc.it/gfoss04/neteler_gfoss04_interoperability2005.pdf This work is licensed under a Creative Commons License. http://creativecommons.org/licenses/by-sa/2.0/deed.en “ Free GIS and Interoperability”, © 2004-2005 Markus Neteler [ OpenOffice SXI file available upon request: neteler at itc it neteler at osgeo org ] License details: Attribution-ShareAlike 2.0 You are free: to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions: Attribution. You must give the original author credit. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above.

to copy, distribute, display, and perform the work

to make derivative works

to make commercial use of the work

Under the following conditions: Attribution. You must give the original author credit. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above.

Add a comment

Related presentations

Related pages

Interoperability - GIS Wiki | The GIS Encyclopedia

... they are exhibiting syntactic interoperability. Specified data formats, ... interoperability. First there needs ... Office for Interoperability ...
Read more

Supported Formats - Esri - GIS Mapping Software, Solutions ...

ArcGIS Data Interoperability enables ArcGIS users to integrate, ... Esri Offices Worldwide; Contact Us; News News. ... Supported Formats.
Read more

ArcGIS Data Interoperability | Overview - Esri - GIS ...

Esri Offices Worldwide; Contact Us ... ArcGIS Data Interoperability supports various proprietary formats and protocols as well as standardized formats from ...
Read more

GeoNis - FRAMEWORK FOR GIS INTEROPERABILITY

representation raise specific issues for GIS interoperability. The need to share geographic information is well ... offices that own geodata in some format.
Read more

THE INTEROPERABILITY OF COMPUTER AIDED DESIGN AND ...

KYTC’s ability to convert CAD data to a GIS format ... GIS classes taught at central office. ... a need for better CAD and GIS interoperability ...
Read more

Distributed GIS and metadata - - ResearchGate - Share and ...

Constructing applications in sucha distributed GIS environment needs ... the interoperability of a limited number of GIS ... with office type software or ...
Read more

ArcGIS Data Interoperability | Overview - Esri UK - GIS ...

Overview. ArcGIS Data Interoperability eliminates ... any standard GIS data, regardless of format, ... without the need to convert between formats.
Read more

Geographic Information Systems (GIS) Software Development ...

Cadcorp is a British software development company focused on geographic information systems (GIS) ... Interoperability. ... formats, without the need ...
Read more

Data Interoperability - Office of Surface Mining

Data Interoperability ... ArcGIS Data Interoperability allows users to manipulate many forms of GIS data despite the native format of the ... NEED HELP ...
Read more