HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes

33 %
67 %
Information about HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters,...

Published on February 19, 2014

Author: HDFEOS

Source: slideshare.net


In this Tutorial we will discuss different storage methods for the HDF5 files (split files, family of files, multi-files), and datasets (compressed, external, compact), and related filters and properties. This tutorial will introduce advanced features of HDF5, including:

o Property lists
o Compound datatypes
o hyperslab selections
o point selection
o references to objects and regions
o extendable datasets
o mounting files
group iterations

HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004 1 HDF

Topics General Introduction to HDF5 properties HDF5 Dataset properties I/O and Storage Properties (filters) HDF5 File properties I/O and Storage Properties (drivers) Datatypes Compound Variable Length Reference to object and dataset region 2 HDF

General Introduction to HDF5 Properties 3 HDF

Properties Definition • Mechanism to control different features of the HDF5 objects – Implemented via H5P Interface (‘Property lists’) – HDF5 Library sets objects’ default features – HDF5 ‘Property lists’ modify default features • At object creation time (creation properties) • At object access time (access or transfer properties) 4 HDF

Properties Definitions • A property list is a list of name-value pairs – Values may be of any datatype • A property list is passed as an optional parameters to the HDF5 APIs • Property lists are used/ignored by all the layers of the library, as needed 5 HDF

Type of Properties • Predefined and User defined property lists • Predefined: – – – – File creation File access Dataset creation Dataset access • Will cover each of these 6 HDF

Properties (Example) HDF5 File • H5Fcreate(…,creation_prop_id,…) • Creation properties (how file is created?) – Library’s defaults • no user’s block • predefined sizes of offsets and addresses of the objects in the file (64-bit for DEC Alpha, 32-bit on Windows) – User’s settings • User’s block • 32-bit sizes on 64-bit platform • Control over B-trees for chunking storage (split factor) 7 HDF

Properties (Example) HDF5 File • H5Fcreate(…,access_prop_id) • Access properties or drivers (How is file accessed? What is the physical layout on the disk?) – Library defaults • STDIO Library (UNIX fwrite, fread) – User’s defined • MPI I/O for parallel access • Family of files (100 Gb HDF5 represented by 50 2Gb UNIX files) • Size of the chunk cache 8 HDF

Properties (Example) HDF5 Dataset • H5Dcreate(…,creation_prop_id) • Creation properties (how dataset is created) – Library’s defaults • • • • Storage: Contiguous Compression: None Space is allocated when data is first written No fill value is written – User’s settings • • • • 9 Storage: Compact, or chunked, or external Compression Fill value Control over space allocation in the file for raw data – at creation time – at write time HDF

Properties (Example) HDF5 Dataset • H5Dwrite<read>(…,access_prop_id) • Access (transfer) properties – Library defaults • 1MB conversion buffer • Error detection on read (if was set during write) • MPI independent I/O for parallel access – User defined • MPI collective I/O for parallel access • Size of the datatype conversion buffer • Control over partial I/O to improve performance 10 HDF

Properties Programming model • Use predefined property type – – – – H5P_FILE_CREATE H5P_FILE_ACCESS H5P_DATASET_CREATE H5P_DATASET_ACCESS • Create new property instance – H5Pcreate – H5Pcopy – H5*get_access_plist; H5*get_create_plist • Modify property (see H5P APIs) • Use property to modify object feature • Close property when done – H5Pclose 11 HDF

Properties Programming model • General model of usage: get plist, set values, pass to library hid_t plist = H5Pcreate(copy)(predefined_plist); OR hid_t plist = H5Xget_create(access)_plist(…); H5Pset_foo( plist, vals); H5Xdo_something( Xid, …, plist); H5Pclose(plist); 12 HDF

HDF5 Dataset Creation Properties and Predefined Filters 13 HDF

Dataset Creation Properties • Storage – – – – Contiguous (default) Compact Chunked External • Filters applied to raw data – Compression – Checksum • Fill value • Space allocation for raw data in the file 14 HDF

Dataset Creation Properties Storage Layouts • • Storage layout is important for I/O performance and size of the HDF5 files Contiguous (default) • • • Compact • • • • 15 Used when data will be written/read at once H5Dcreate(…,H5P_DEFAULT) Used for small datasets (order of O(bytes)) for better I/O Raw data is written/read at the time when dataset is open File is less fragmented To create a compact dataset follow the ‘Properties programming model’ HDF

Creating Compact Dataset • • • Create a dataset creation property list Set property list to use compact storage layout Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_layout(plist, H5D_COMPACT); dset_id = H5Dcreate (…, “Compact”,…, plist); H5Pclose(plist); 16 HDF

Creating chunked Dataset • Chunked layout is needed for – Extendible datasets – Compression and other filters – To improve partial I/O for big datasets chunked Better subsetting access time; extendible Only two chunks will be written/read 17 HDF

Creating Chunked Dataset • • • Create a dataset creation property list Set property list to use chunked storage layout Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk(plist, rank, ch_dims); dset_id = H5Dcreate (…, “Chunked”,…, plist); H5Pclose(plist); 18 HDF

Dataset Creation Properties Compression and other I/O Pipeline Filters • HDF5 provides a mechanism (“I/O filters”) to manipulate data while transferring it between memory and disk H5Z and H5P interfaces HDF5 predefined filters (H5P interface) • • – – • Compression (gzip, szip) Shuffling and checksum filters User defined filters (H5Z and H5P interfaces) – 19 Example: Bzip2 compression http://hdf.ncsa.uiuc.edu/HDF5/papers/bzip2 HDF

Compression and other I/O Pipeline Filters (continued) • • Currently used only with chunked datasets Filters can be combined together – – • GZIP + shuffle+checksum filters Checksum filter + user define encryption filter Filters are called in the order they are defined on writing and in the reverse order on reading User is responsible for “filters pipeline sanity” • – – 20 GZIP +SZIP + shuffle doesn’t make sense Shuffle + SZIP does HDF

Creating compressed Dataset • Compression – – – – Improves transmission speed Improves storage efficiency Requires chunking May increase CPU time needed for compression Memory File Compressed 21 HDF

Creating compressed datasets • • • • Create a dataset creation property list Set chunking (and specify chunk dimensions) Set compression method Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk (plist, ndims, chkdims); H5Pset_deflate (plist, level); /*GZIP */ OR H5Pset_szip (plist, options-mask, numpixels);/*SZIP*/ dset_id = H5Dcreate (file_id, “comp-data”, “H5T_NATIVE_FLOAT,space_id, plist); 22 HDF

Creating external Dataset • • • • Dataset’s raw data is stored in an external file Easy to include existing data into HDF5 file Easy to export raw data if application needs it Disadvantage: user has to keep track of additional files to preserve integrity of the HDF5 file Dataset “A” HDF5 file External file Raw data for “A” Raw data can be stored in external file Metadata for “A” 23 HDF

Creating External Dataset • • • Create a dataset creation property list Set property list to use external storage layout Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_external(plist, “raw_data.ext”, offset, size); dset_id = H5Dcreate (…, “Chunked”,…, plist); H5Pclose(plist); 24 HDF

Example of External Files This example shows how a contiguous, one-dimensional dataset is partitioned into three parts and each of those parts is stored in a segment of an external file. plist = H5Pcreate (H5P_DATASET_CREATE); HPset_external (plist, “raw.data”, 3000, 1000); H5Pset_external (plist, “raw.data”, 0, 2500); H5Pset_external (plist, “raw.data”, 4500, 1500); 25 HDF

Checksum Filter • HDF5 includes the Fletcher32 checksum algorithm for error detection. • It is automatically included in HDF5 • To use this filter you must add it to the filter pipeline with H5Pset_filter. Memory 26 Checksum value HDF

Enabling Checksum Filter • • • • • Create a dataset creation property list Set chunking (and specify chunk dimensions) Add the filter to the pipeline Create your dataset specifying this property list Close property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk (plist, ndims, chkdims); H5Pset_filter (plist, H5Z_FILTER_FLETCHER32, 0, 0, NULL); H5Dcreate (…,”Checksum”,…,plist) H5Pclose(plist); 27 HDF

Shuffling filter • Predefined HDF5 filter • Not a compression; change of byte order in a stream of data • Example – 1 23 43 • Hexadecimal form – 0x01 0x17 0x2B • Big-endian machine – 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x17 0x00 0x00 0x00 0x2B • Shuffling 28 – 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x17 0x2B HDF

00 00 00 01 00 00 00 17 00 00 00 2B 00 00 00 00 00 00 00 00 00 01 17 2B 29 HDF

Enabling Shuffling Filter • • • • • • Create a dataset creation property list Set chunking (and specify chunk dimensions) Add the filter to the pipeline Define compression filter Create your dataset specifying this property list Close property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk (plist, ndims, chkdims); H5Pset_shuffle(plist); H5Pset_deflate(plist,level); H5Dcreate (…,”BetterComp”,…,plist) H5Pclose(plist); 30 HDF

Effect of data shuffling (H5Pset_shuffle + H5Pset_deflate) • Write 4-byte integer dataset 256x256x1024 (256MB) • Using chunks of 256x16x1024 (16MB) • Values: random integers between 0 and 255 File size Write Time No Shuffle 102.9MB 671.049 629.45 Shuffle 31 Total time 67.34MB 83.353 78.268 Compression combined with shuffling provides •Better compression ratio •Better I/O performance HDF

HDF5 Dataset Access (Transfer) Properties 32 HDF

Dataset Access/Transfer Properties • Improve performance • H5Pset_buffer – Sets the size of the datatype conversion buffer during I/O – Size should be large enough to hold the slice along the slowest changing dimension – Example: Hyperslab 100x200x300, buffer 200x300 • H5Pset_hyper_vector_size – Sets the number of hyperslab offset and length pairs – Improves performance for partial I/O 33 HDF

Dataset Access/Transfer Properties • H5Pset_edc_check – – – – For datasets created with error detection filter enabled Enables error checking during read operation H5Z_ENABLE_EDC (default) N5Z_DISABLE_EDC • H5Pset_dxpl_mpio – Sets data transfer mode for parallel I/O – H5FD_MPIO_INDEPENDENT (default) – H5FD_MPIO_COLLECTIVE 34 HDF

User-defined Filters 35 HDF

Standard Interface for User-defined Filters • H5Zregister : Register filter so that HDF5 knows about it • H5Zunregister: Unregister a filter • H5Pset_filter: Adds a filter to the filter pipeline • H5Pget_filter: Returns information about a filter in the pipeline • H5Zfilter_avail: Check if filter is available 36 HDF

File Creation Properties 37 HDF

File Creation Properties • H5Pset_userblock – User block stores user-defined information (e.g ASCII text to describe a file) at the beginning of the file – Cat my.txt hdf5.h5 > myhdf5.h5 – Sets the size of the user block – 512 bytes, 1024 bytes, 2^N • H5Pset_sizes – Sets the byte size of the offsets and lengths used to address objects in the file • H5Pset_sym_k – Controls the rank of groups B-trees for groups – Default is 16 • H5Pset_istore_k – Controls the rank of groups B-trees for chunked datasets – Default is 32 38 HDF

File Access Properties 39 HDF

File Access Properties (Performance) • H5Pset_cache – Sets metadata cache and raw data chunk parameters – Improper size will degrade performance • H5Pset_meta_block_size – Reduces the number of small objects in the file – Block of metadata is written in a single I/O operation (default 2K) – VFL driver has to set H5FD_AGGREGATE_METADATA • H5Pset_sieve_buffer – Improves partial I/O – Need a picture • VFL layer: file drivers 40 HDF

File Access Properties (Physical storage and Usage of Low-level I/O Libraries) • VFL layer: file drivers • Define physical storage of the HDF5 file – – – – Memory driver (HDF5 file in the application’s memory) Stream driver (HDF5 file written to a socket) Split(multi) files driver Family driver • Define low level I/O library – MPI I/O driver for parallel access – STDIO vs. SEC2 41 HDF

Files needn’t be files - Virtual File Layer VFL: A public API for writing I/O drivers Hid_t “File” Handle VFL: Virtual File I/O Layer stdio mpio split family I/O drivers SRB memory network “Storage” Files 42 SRB Memory Repository Network HDF

Split Files • Allows you to split metadata and data into separate files • May reside on different file systems for better I/O • Disadvantage: User has to keep track of the files HDF5 file Metadata file Raw data file Dataset “A” Dataset “B” Data A Data B 43 HDF

Creating Split Files • • • • Create a file access property list Set up file access property list to use split files Create the file with this property list Close the property plist = H5Pcreate (H5P_FILE_ACCESS); H5Pset_fapl_family(plist, “.met”, H5P_DEFAULT,”.dat”, H5P_DEFAULT); file = H5Fcreate plist); H5Pclose(plist); 44 (H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, HDF

File Families • Allows you to access files larger than 2GB on file systems that don't support large files • Any HDF5 file can be split into a family of files and vice versa • A family member size must be a power of two 45 HDF

Creating a File Family • Create a file access property list • Set up file access property list to use file family • Create the file with this property list plist = H5Pcreate (H5P_FILE_ACCESS); H5Pset_fapl_family (plist, family_size, H5P_DEFAULT); file = H5Fcreate (H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist); H5Pclose(plist); 46 HDF

HDF5 Datatypes 47 HDF

Datatypes • A datatype is – A classification specifying the interpretation of a data element – Specifies for a given data element • the set of possible values it can have • the operations that can be performed • how the values of that type are stored – May be shared between different datasets in one file 48 HDF

HDF5 datatypes • Atomic types – – – – – – 49 standard integer & float user-definable scalars (e.g. 13-bit integer) bitfields variable length types (e.g. strings) pointers - references to objects/dataset regions enumeration - names mapped to integers HDF

General Operations on HDF5 Datatypes • Create – H5Tcreate creates a datatype of the HT_COMPOUND, H5T_OPAQUE, and H5T_ENUM classes • Copy – H5Tcopy creates another instance of the datatype; can be applied to any datatypes • Commit – H5Tcommit creates an Datatype Object in the HDF5 file; comitted datatype can be shared between different datatsets • Open – H5Topen opens the datatypes stored in the file • Close – H5Tclose closes datatype object 50 HDF

Programming model for HDF5 Datatypes • Use predefined HDF5 types – No need to close • OR – Create • Create a datatype (by copying existing one or by creating from the one of H5T_COMPOUND(ENAUM,OPAQUE) classes) • Create a datatype by queering datatype of a dataset – Open committed datatype from the file • (Optional) Discover datatype properties (size, precision, members, etc.) • Use datatype to create a dataset/attribute, to write/read dataset/attribute, to set fill value • (Optional) Save datatype in the file • Close 51 HDF

HDF5 Compound Datatypes • Compound types – – – – – – Comparable to C structs Members can be atomic or compound types Members can be multidimensional Can be written/read by a field or set of fields Non all data filters can be applied (shuffling, SZIP) H5Tcreate(H5T_COMPOUND), H5Tinsert calls to create a compound datatype – See H5Tget_member* functions for discovering properties of the HDF5 compound datatype 52 HDF

HDF5 Fixed and Variable length array storage •Data •Data Time •Data •Data •Data Time •Data •Data •Data •Data 53 HDF

HDF5 Variable Length Datatypes Programming issues • Each element is represented by C struct typedef struct { size_t length; void *p; } hvl_t; • Base type can be any HDF5 type 54 HDF

HDF5 Variable Length Datatypes Raw data Global heap Dataset with variable length datatype 55 HDF

HDF Information • HDF Information Center – http://hdf.ncsa.uiuc.edu/ • HDF Help email address – hdfhelp@ncsa.uiuc.edu • HDF users mailing list – hdfnews@ncsa.uiuc.edu 56 HDF

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

HDF5 Advanced Topics - Workshop on High-Performance ...

• HDF5 filters modify ... • See H5Tget_member* functions for discovering properties of the HDF5 compound datatype ... PSI-HDF5-ADVANCED-TOPICS ...
Read more

HDF5 Advanced Topics - The HDF Group - Information ...

HDF5 Advanced Topics Elena ... • HDF5 filters modify data during ... Set property list to use chunked storage layout 3. Set property list to use ...
Read more

The HDF Group - Information, Support, and Software

The HDF Group is a not-for-profit corporation with the mission ... Advanced Topics Properties (HDF5 1.6) ... Properties are features of HDF5 objects, ...
Read more

Datatypes - Oracle Help Center

large_object_datatypes::= ... the DATE datatype has special associated properties. ... Expression Filter uses a virtual datatype called Expression to ...
Read more

Oracle Data Types - Oracle Help Center

26 Oracle Data Types. ... information about PL/SQL datatypes. Oracle Database Advanced Application Developer's Guide ... in a method for an object ...
Read more

Data Types (C# vs. Java)

This topic discusses some of the primary ... methods, and events is similar in Java and ... all primitive data types in C# are objects in the ...
Read more

HDF4 and HDF5 Performance Preliminary Results Elena ...

HDF5 HDF4 Files over 2GB Unlimited number of objects One ... Diversity of datatypes ... registration of compression methods ...
Read more

⚡Presentation "Www.hdfgroup.org The HDF Group April 17 ...

... Python interface to HDF5 Easy to learn Saves a lot of time fro prototyping and getting data and metadata out of HDF5 ... Topics Covered What HDF5 ...
Read more