advertisement

ETL for the masses with Power Query and M

54 %
46 %
advertisement
Information about ETL for the masses with Power Query and M
Technology

Published on March 7, 2014

Author: regisbaccaro

Source: slideshare.net

Description

This is the slide deck for my presentation given at SQL Saturday 280 in Vienna.on March 6th 2014.
advertisement

ETL for the Masses Régis Baccaro – IBM @regbac

Our Sponsors

Introduction Régis Baccaro @regbac http://Theblobfarm.wordpress.com http://Thelovefarm.wordpress.com regis@baccaro.com • • • • • Founder and lead organizer of SQL Saturday Denmark PASS Regional Mentor Works for IBM Passionate about the community .Net developer, BI dude, SharePoint fellow and accidental DBA

Agenda • Power Query and the M language • E and T and L with Power Query • Data refresh techniques with PQ • Next step

Introduction • Power Query • Get data experience • Filter and combine • Embedded M for repeatable mashup • Power Query Formula Language (aka M) • • • • • Mostly pure Higher-order Dynamically typed Partially lazy  Functional programming language

Elements of language • Expressions – central construct • Evaluated to a single vlaue • Values • • • • • Primitives List – ordered seq. Record – set of fields Table Function

Evaluation • Excel-like (surprise !) • Nested records • In Records • In Lists • Lazy evaluation • Lists and Records (and let) • Eager evaluation • Everything else

Functions and Standard Library • Mapping from a set of values to a single value • (named parameters) => function body • Common set of definitions

Operators • Meaning varies depending on kind of value • & = text or list concatenation and records merge

Metadata • Information about a value that is associated with a value • A record • Exists for every value • Unobtrusive way to add information • Accessed with Value.Metadata

Let .....in expression • So far only literal values • Let allows a set of value to be: • Computed • Named • Used in subsequent expressions that follows the in let in Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......), RowCount = Table.RowCount(Source) RowCount

IF expression • Select between 2 expression based on logical condition

Error expression • When an expression evaluation cannot yield a value • Raised with error • Handled with try • Produces an Error record • try...otherwise Used with default values

Keywords and Operators • and as each else error false if in is let meta not otherwise or section shared then true try type #binary #date #datetime #datetimezone #duration #infinity #nan #sections #shared #table #time • , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? => .. ...

The ”E” - Why is Power Query great for Extracting data • Multiple data sources Hey wait ! Where is PDW ?

Query folding - A step toward declarative ETL approach • Declarative vs Imperative • Query folding similar to predicate pushdown • Does Power Query have a Query Optimizer ? • Demo Query folding - the unofficial list: • SQL Databases • OData and OData based sources, such as the Windows Azure Marketplace and SharePoint Lists • Active Directory • HDFS.Files, Folder.Files, and Folder.Contents (for basic operations on paths) • • • • Column removal Renaming Joins Type conversions

Real life scenario – ETL for the masses • Seen a lot of demos • Build a lot of demos • They are always so clean !

Real life scenario

Transform • M is how the magic happens! • Data manipulation • Records • Lists • Tables • Merging • Function calls

What about our scenario? • Where should I get my data from? • Pure Excel • Excel and MDS/DQS/SSIS/SQL • Web, SQL, XML, ? • Let me show you ! Input • (cvr web)

Let’s go to homegrown data? • Bad web service • Bad HTML structure • Let’s go with local data that we can control Isolated DB • SQL Server • Excel • Let’s Query! Local storage

Clean up before you merge! • DQS Knowledge base with CVR + Cleansing project with LinkedIn input ________________________________________ = Demo2.1_AndreasStrandbyClean + • Hit ratio increased... Hit 250 Total 100% 90% 80% 200 70% 60% 150 50% = 40% 100 30% 20% 50 10% 0 0% Clean join Nested Merge join

Smarter Power Query • Expression.Evaluate() • Examples • Load query text from file • Load function from file • Passing parameters (as constants) • Demo

Refreshing Power Query data • Different solutions • All with flaws !

Refreshing Power Query data – with VB6 ! • Back from 2006 Plus Minus Can be scheduled VB6 – are you kidding ? More robust than the non-technical solution • From Kim GreenLee

Refreshing Power Query data – with PowerShell Plus Minus Robust Hard to troubleshoot Can not run in a task in windows task scheduler unless the user has checked that the user has to be logged on to run

Refreshing Power Query data – The non-technical way • Let me show you ! Plus Minus Very easy Not very corporate ! The spreadsheet needs to be open Excel file not saved Locked out when it refreshes

Refreshing Power Query data – The non-technical way part 2 • Let me show you ! Plus Minus Very easy Not very corporate ! Uses technique from previous The spreadsheet needs to be open

Refreshing Power Query data – with SSIS Plus Minus Robust Requires a SQL Server (wait, it’s a plus!) Needs a SSIS / C# developer

Refreshing Power Query data – with SSIS • Using DQS for cleansing input • Let me show you !

How is Power query going to be used? • Data store accumulating interesting data points • Hook into read only data for reporting purposes or data marts • One file to accumulate (Produce) • Multiple files or programs to report (Consume) • I don’t believe in “Data Steward” • I believe someone will be in charge of procuring and monitoring data stores of disparate data (such as IT or DBA’s).

Conclusion • A step toward declarative ETL approach • Still much work to do ! We have • A declarative data integration language • Only surfaced in Power Query • Can push data to an Excel spreadsheet Imagine..... • Connection to heterogenous data sources

THANK YOU! @REGBAC HTTP://THEBLOBFARM.WORDPRESS.COM REGIS@BACCARO.COM

#binary presentations

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Power Query: Data Chemistry for the Masses | Mark V SQL

Power Query: Data Chemistry for the Masses. ... ETL Developers have being doing chemistry with ... — Using “”M”” to take Power Query WAY beyond ...
Read more

Power Query: Data Chemistry for the Masses - BIDN

Power Query: Data Chemistry for the Masses. 0 0. Flag ... ETL Developers have being doing chemistry with data for ... M Query Basics for Power Query ...
Read more

Power Query for Business - Pragmatic Works

Power Query for Business ... Combining Data with Power Query; M Query ... ETL Developers have being doing chemistry with data for years in tools ...
Read more

Power Query: Data Chemistry for the Masses - Pragmatic Works

Power Query: Data Chemistry for the Masses. ... Mark Vaillancourt. Watch Video. Overview. ETL ... informally known as ""M""-- Using ""M"" to take Power ...
Read more

Power Query | LinkedIn

View 314 Power Query ... and applying these types of Joins are not all possible through Power Query GUI. Power ... ETL for the masses with Power Query and M.
Read more

PASS Summit 2014 > Sessions > Details

Power Query: Data Chemistry for The Masses ... ETL Developers have being doing ... informally know as "M" -- Using "M" to take Power Query WAY beyond what ...
Read more

SQLSaturday #387 - Madison 2015 > Sessions > Details

Power Query: Data Chemistry for the Masses. ... ETL Developers have being doing ... informally known as "M" -- Using "M" to take Power Query WAY beyond ...
Read more

Microsoft Business Intelligence - Minnesota

Power Query Data Chem For the Masses Session Materials. ETL Developers have being doing ... informally known as "M"-- Using "M" to take Power Query WAY ...
Read more