Published on March 12, 2016
1. How to monitor and analyze a data fileHow to monitor and analyze a data file before loading in Staging Areabefore loading in Staging Area Recipes of Data Warehouse and Business IntelligenceRecipes of Data Warehouse and Business Intelligence
2. • In this use case, that is part of the Micro ETL Foundation (MEF), we try to handle a preliminary analysis of the data files. • Configure and load the first test data file, is a activities which can be very time consuming. Very often the data file is full of special characters and other abnormalities that prevent its loading. • Sometimes the data file is loaded correctly in the Staging Area, but the presence of "dirty" and/or special characters creates problems later to the Business Intelligence interfaces or to the HTML reports. • Then, I will provide you a tool that can help identify problems immediately about data files. This will allow us to alert the feeding system indicating very precisely where are the anomalies. Analysis of the use case
3. • There are no changes to the repository MEF. Download from (https://drive.google.com/open?id=0B2dQ0EtjqAOTZk1Eb3J0UEVsMnc) The installation is very simple and it will act as a plug-in to MEF. They will only be added new structures. • If you haven’t MEF, take the latest version under (https://drive.google.com/open?id=0B2dQ0EtjqAOTNjZlUFR0NkIyQm8) and follow the readme file for the installation. • Very useful is the new MEF_ASCII_CFT. This table was loaded with the 256 encodings ascii with descriptions. We see a fragment in the next figure. • The table MEF_ANAFILE_LOT will contain the result of the analysis of the data file with only the lines that contained abnormal situations, on the "dirty" characters or on incorrect number of column separators (only for data file of csv type). Changes to MEF repository
4. Changes to MEF repository
5. To test, simply run the procedure mef_dfana.p_analyze. Care must be taken in the input parameters: p_io_cod – data file code - This parameter does not affect the procedure. It can be useful for a link to the configuration table of the data file. p_dir - file folder - Oracle directory for the path of the data file. For example DWH_DAT. p_file - file name - File name. We can try a data file of a previous use case, for example, regmar_20160205.csv p_t1 - decimal format of the 1st terminator - It may happen that to indicate the end of row are used more characters. Indicate the decimal code of the first character. For example, specify 10 for the carriage return <CR> p_t2 - decimal format of the 2nd terminator if exists - Indicate the decimal code of the second character. For example, specify 13 for the line feed <LF> p_t3 - decimal format of the 3rd terminator if exists - Indicate the decimal code of the third character if it exists p_sep - decimal format of the columns separator - Indicate the decimal code of the field separator if it exists.(59=";",44=",") Execute the use case
6. p_sep_cnt - counter of separators - Indicate the number of the expected field separators. Generally it coincides with the number of fields, but sometimes the last field is still followed by a separator. If you are unsure, it leaves null this parameter: the procedure will use the number of separators of the first row. p_from - analyze row from - Number of the line from which the analysis of the data file will start. The default is 1. p_to - analyze row to - Number of the last line to be analyzed. If null, it means the entire file. Could be useful, for very large data file, first try on a reduced number of rows. To control the end-of-row characters, I suggest to open the data file by an editor with the exadecimal/decimal visualization. On open, don't convert to others format. Usually they are 10 <LF> or 13+10 <CR><LF>. To test, We can use the data files of the use cases. SQL> exec mef_dfana.p_analyze('TEST','DWH_DAT','regmar_20160205.csv',10,null,null,44,null); Execute the use case
7. The message log table of MEF: Execute the use case The result on the view MEF_ANAFILE_LOV:
8. References http://www.slideshare.net/jackbim/recipe-14-of-data-warehouse-and-business-intelligence-build-a-staging-area-for-an-oracle-data- warehouse-1 http://massimocenci.blogspot.it/ Email: Massimo_cenci@yahoo.it
... introduce the term "business data warehouse". ... loading the data warehouse with data from ... are created from the data warehouse. 
Business Warehouse (BW) is the Data ... BW Monitor. BW Monitor displays data loading status and ... The Persistent Staging Area (PSA) is a data ...
A data warehouse is fed from the data staging area. The data warehouse ... Loading in the data warehouse ... data visualization (business intelligence) ...
A proposed model for data warehouse ETL ... being loaded to the data warehouse. Source, staging area, ... take place before loading data to the ...
Now Toad Data Point includes native ... How to query data in Oracle Business Intelligence Enterprise ... How to analyze joins in Toad Data ...
... events, data warehouse loading, ... Overview of the Oracle Data Dictionary. Each Oracle database has a ... Messages in a staging area are ...
All Programming Interview Questions And Answers for ... data warehousing and business intelligence ... the data warehouse. Data Staging Area a ...
4 ETL Implementation and Customization. ... populate an Oracle Communications Data Model warehouse. ... analyze the reference tables before loading the ...