Published on March 9, 2014
INTRODUCTION TO SQL BOOTHTECH JENNIFER BERK, MAY 2012
AGENDA • What's SQL and when would you use it • How to find out what's in your database • How to pull the information you need, with progressively more realistic exercises • How to learn more • Questions and more practice time • Note: specific syntax today is for MySQL, but any database will be similar
WHAT IS SQL? A N D WH Y I T ’ S U S E F U L T O A N M B A
RELATIONAL DATABASE Billing address has an Date placed Shipping method containing Item Contact name Order Client • Stores data in a series of tables • Also stores information about relationships between data in tables (hence relational) Item number Quantity Size Color
WHAT KIND OF DATA? • Numeric • Integers with various possible ranges (sometimes TRUE/FALSE Boolean values are encoded as 0/1 in an integer field) - INT • Decimals or floating-point numbers – FLOAT • Time and date - DATETIME • Text • With a specified length in characters – VARCHAR • Longer text – TEXT • Binary data – BLOB
WHY USE A RELATIONAL DATABASE • Think about a list of contacts in Excel – basically one database table – and adding a phone number. • Do you have enough phone number columns? • Does cell phone have its own field or is it just another phone number? • How do you mark the primary phone number? • A database lets you have as many phone numbers as you want associated with one person, because you can put phone numbers in a separate table.
STRUCTURED QUERY LANGUAGE • SQL is a way to pull the information you need out of a relational database • You’ll probably use it to fetch summary information from a central data warehouse and then do additional analysis in Excel/SAS/etc. • SQL is also used (by the people who set up the data warehouse) to create tables, insert data, and delete unneeded information.
SANDBOX 1. Take out your laptop 2. Go to http://coderzone.org/sqlsandbox/ 3. Click “Sign In”
WHAT’S IN YOUR DATABASE? SHOW TABLES; DESCRIBE <TABLENAME>;
SHOW TABLES; • Returns a list of tables in the database • May not need to use this if you have a graphical user interface (GUI) • Can limit which tables are returned, e.g. “SHOW TABLES LIKE ‘%c%’;” will return all the tables with a “c” in their name
DESCRIBE <TABLENAME>; • Returns information about each field in table <tablename> • Includes field name, type, and other information like if the field can be NULL or if it’s required to have a value • May not need to use this if you have a graphical user interface (GUI)
TRY IT OUT • In the sandbox, try looking at tables. • SHOW TABLES [LIKE “%c%”]; • DESCRIBE sb_css_colors; • Which tables contain the most types of data?
SQL STYLE • Capitalize SQL keywords (SELECT, JOIN, AS, etc.) • Lower-case table and field names Makes it easy to see what values are specific to the database you’re using.
BASIC DATA SELECTION SELECT <FIELDS> FROM <TABLENAME>;
SELECT <FIELDS> FROM <TABLENAME>; • Returns some data from a table • You’ll need the field names you found before from the DESCRIBE <tablename> statement • <fields> can be: • • • • One field name Multiple field names separated by commas Special value * (asterisk), which means “everything” Special value COUNT(*), which means “how many rows are there?” • This returns all the rows of the table – why might that be a problem?
TRY IT OUT • In the sandbox, try looking at some data in tables. • SELECT * FROM sb_css_colors; • SELECT COUNT(*) FROM sb_airports; • SELECT biz_name, state, phone FROM sb_airports;
LIMITING DATA SELECTION SELECT <FI ELDS> FROM < TABLENAME> WHERE <LI MITS> ;
WHERE • SELECT <fields> FROM <tablename> WHERE <limits> ; • This is how to start picking out only the data you’re interested in • WHERE clauses can be: • <fieldname> = <value> such as 2 or “Bob” • Similar but with LIKE, >, >=, <, <=, != (note that different SQL implementations require different representations for dates/times, so you may have to check documentation to figure out how to say purchase_date > “2012-01-01”, for example) • Things involving functions, like ABS(<fieldname>) > 20 • Several of those joined by AND, OR, etc.
TRY IT OUT • In the sandbox, try limiting what data you grab. • SELECT * FROM sb_css_colors WHERE hexcolor=“#800000” • SELECT * FROM sb_css_colors WHERE hexcolor LIKE “%FF%” • SELECT biz_name, state, zip, phone FROM sb_airports WHERE (state=“IL” OR state=“IN”) AND phone LIKE “%(800)%”;
MORE DATA SELECTION SELECT <FIELDS> FROM < TABLENAME> WHERE <LIMITS> ORDER BY <FIELD> <ORDER> GROUP BY <FIELD>;
ORDER BY • SELECT <fields> FROM <tablename> ORDER BY <field> <order>; • Like sorting in Excel • Order can be ASC (ascending) or DESC (descending) • Don’t sort and then assume you know what the possible values were – e.g. what’s at the top if you do SELECT * FROM sb_css_colors ORDER BY hexcolor DESC;
GROUP BY • SELECT <fields and/or functions of fields> FROM <tablename> GROUP BY <field>; • Lets you create summary statistics directly instead of in Excel later • Commonly used functions: COUNT(), COUNT(DISTINCT), SUM(), AVG(), MAX(), MIN(),
TRY IT OUT • In the sandbox, try more complicated SELECTs. • SELECT * FROM sb_css_colors ORDER BY hexcolor DESC; (what were the possible values?) • SELECT state, zip, COUNT(*) from sb_airports GROUP BY zip; • SELECT state, AVG(zip) FROM sb_airports GROUP BY state;
CONNECTING TABLES: JOINS SELECT <FIELDS> FROM <TABLENAME> JOIN <TABLENAME> ON <FIELD MATCH>;
JOIN • SELECT <fields> FROM <tablename> JOIN <tablename> ON <field match>; • Joins let you connect related data from multiple tables, e.g. a customer name from a customers table and an order date from an orders table • Normally you’ll join on an ID or “key” column • Often need to specify both table and field when writing the field match statement • customers.id = orders.customer_id
TRY IT OUT • In the sandbox, try joining tables. • SELECT comp_dept, first_name, last_name, phone FROM sb_departments JOIN sb_employees ON sb_departments.emp_id = sb_employees.emp_id WHERE comp_dept = “Payroll”;
HOW TO LEARN MORE USEFUL RESOURCES
ONLINE RESOURCES • MySQL manuals (MySQL is open source, so there are lots of resources freely available) • http://dev.mysql.com/doc/refman/5.6/en/index.html • Question sites: • http://www.quora.com/MySQL/ • http://programmers.stackexchange.com/questions/tagged /sql (pretty technical in general)
SANDBOXES • We’ve used http://coderzone.org/sqlsandbox/ (nice because it has multiple tables) • Also http://www.w3schools.com/sql/default.asp • You can make your own sandbox to try things out if you have web hosting (e.g. unlimited MySQL databases on DreamHost)
QUESTIONS? AND MORE PRACTICE TIME
Student-Led Groups. ... Booth Technology Group; Chicago Booth Operations Strategy Group; Corporate Finance Group; ... Chicago Booth Rugby Club;
... the first students to have begun their Executive MBA journey at Booth's Hong Kong campus reflect on their ... Chicago Booth School of Business ...
... the Chicago Booth Energy Group is for you. ... Alternative transporation technologies ... Chicago Booth Part Time Energy Club ...
Introduction to Data Science from University of Washington. ... including both SQL and NoSQL solutions for massive data management ... Introduction .
... the University of Chicago ... Booth School of Business MBA: 1832-02, Booth School ... school student must meet the English language requirements ...
At Illinois, we admit students who are ready to utilize the challenging curriculum of the Illinois MBA to become impactful leaders in their chosen career ...
College of Business at Illinois. ... Hoeft Technology & Management Program; Masters. ... Professional MBA; Executive MBA (Chicago)
Consulting Case Interview Preparation Guide ... during my MBA summer internship recruiting cycle) ... • Product introduction
... MBA & Masters programs. GMAT registration, ... Use the same technology as the official GMAT exam so you can simulate the actual test taking experience.