After reading Chapter 11a and Chapter 11b Lecture (see the attachments), “Advanced Analytics -Technology and Tools - Advanced SQL and MADlib”  • Describe how the following code will work: SELECT dt, r

Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Advanced Analytics - Technology and Tools 1 Module 5: Advanced Analytics -Technology and Tools 1 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Advanced Analytics -Technology and Tools During this lesson the following topics are covered:

• SQL Essentials • SET Operations • Online analytical processing (OLAP) features • GROUPING SETS, ROLLUP,CUBE • GROUPING, GROUP_ID functions • Text processing, Pattern matching In -database Analytics SQL essentials 2 Module 5: Advanced Analytics -Technology and Tools These topics are covered in this lesson. 2 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Set Operations Greenplum supports the following set operations as part of a SELECT statement: • INTERSECT –Returns rows that appear in all answer sets • EXCEPT –Returns rows from the first answer set and excludes those from the second • UNION ALL –Returns a combination of rows from multiple SELECT statements with repeating rows • UNION –Returns a combination of rows from multiple SELECT statements with no repeating rows 3 Module 5: Advanced Analytics -Technology and Tools Set Operations Set operators: • Manipulate the results sets of two or more queries by combining the results of individual queries into a single results set. • Do not perform row level filtering. Set operations supported by Greenplum are: • INTERSECT which returns rows that appear in all answer sets generated by individual SELECT statements. • EXCEPT returns all rows from the first SELECT except for those which also selected by the second SELECT . This operation is the same as the MINUS operation. • UNION ALL combines all the results of two or more SELECT statements. There may be repeating rows. • UNION combines the results of two or more SELECT statements. There will be no repeating rows. 3 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Set Operations – INTERSECT INTERSECT : • Returns only the rows that appear in both SQL queries • Removes duplicate rows Table A Table B Intersect SELECT t.transid , c.custname FROM facts.transaction t JOIN dimensions.customer c ON c.customerid = t.customerid INTERSECT SELECT t1.transid, c1.custname FROM facts.transaction t1 JOIN dimensions.customer c1 ON c1.customerid = t1.customerid WHERE t1.transdate BETWEEN ‘2008 -01 -01’ AND ‘2008 -01 -21’ 4 Module 5: Advanced Analytics -Technology and Tools Set Operations –INTERSECT A set operation takes the results of two queries and returns only the results that appear in both result sets. Duplicate rows are removed from the final set returned. 4 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Set Operations – EXCEPT EXCEPT : • Returns all rows from the first SELECT statement • Omits all rows that appear in the second SELECT statement SELECT t.transid , c.custname FROM facts.transaction t JOIN dimensions.customer c ON c.customerid = t.customerid EXCEPT SELECT t1.transid, c1.custname FROM facts.transaction t1 JOIN dimensions.customer c1 ON c1.customerid = t1.customerid WHERE t1.transdate BETWEEN ‘2008 -01 -01’ AND ‘2008 -01 -21’ Table A Results of first query Table B Results of second query Table A minus Table B 5 Module 5: Advanced Analytics -Technology and Tools Set Operations –EXCEPT The EXCEPT set operation takes the distinct rows of the first query and returns all of the rows that do not appear in the result set of the second query. 5 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Set Operations – UNION ALL UNION ALL : • Combines rows from the first query with rows from the second query • Does not remove duplicates rows SELECT t.transid , c.custname FROM facts.transaction t JOIN dimensions.customer c ON c.customerid = t.customerid WHERE t.transdate BETWEEN ‘2008 -01 -01’ AND ‘2008 -05 -17’ UNION ALL SELECT t1.transid, c1.custname FROM facts.transaction t1 JOIN dimensions.customer c1 ON c1.customerid = t1.customerid WHERE t1.transdate BETWEEN ‘2008 -01 -01’ AND ‘2008 -01 -21’ Table A Results of first query Table B Results of second query Table A plus Table B 6 Module 5: Advanced Analytics -Technology and Tools Set Operations –UNION ALL The UNION ALL set operation is like the UNION operation but it does not remove duplicate or repeating rows. 6 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Set Operations – UNION UNION : • Combines rows from the first query with rows from the second query • Removes duplicates or repeating rows SELECT t.transid , c.custname FROM facts.transaction t JOIN dimensions.customer c ON c.customerid = t.customerid WHERE t.transdate BETWEEN ‘2008 -01 -01’ AND ‘2008 -05 -17’ UNION SELECT t1.transid, c1.custname FROM facts.transaction t1 JOIN dimensions.customer c1 ON c1.customerid = t1.customerid WHERE t1.transdate BETWEEN ‘2008 -01 -01’ AND ‘2008 -01 -21’ Table A Results of first query Table B Results of second query Table A plus Table B minus duplicates Duplicate 7 Module 5: Advanced Analytics -Technology and Tools Set Operations –UNION A union operation combines the results of the SELECT statement from the first table with the results from the query on the second table. The result set does not contain any repeating rows. 7 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

• Types of Join • Inner • Left outer • Right outer • Full outer • Cross SET Operations Inner Join Left Outer Join Right Outer Join Full Outer Join Cross Join 8 Module 5: Advanced Analytics -Technology and Tools The type of SET operations you can perform are: • Inner join –The inner join is possibly the most common type of join. The resulting data set is obtained by combining two tables on a common column. Each row of the left table is compared against each row of the right table. All matching rows are returned as part of the result set. An equijoin is an inner join that uses only equality comparisons in the join predicate. • Left outer join –Left outer join returns all of the rows from the left table even if there is no matching row in the right table. It also returns matching rows from the right table.

Rows in the right table that do not match are not included as part of the result set. • Right outer join –Right outer join returns all of the rows from the right table even if there is no matching row in the left table. It also returns the matching rows from the left table. • Full outer join –Full outer join returns all rows from both tables where there is a match and returns NULL for rows that do not have a match. • Cross join –Cross join returns the Cartesian product of rows from tables in the join. The resulting data set consists of a combination of each row in the left table with each row in the right table. Two tables, each with five rows, will produce a resulting data set that contains twenty -five rows. . 8 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Left Outer Join • Correlated sub -queries do not run efficiently in Greenplum though support has been introduced in Version 4.2  SELECT * FROM transaction t WHERE NOT EXISTS ( SELECT 1 FROM customer c WHERE c.customerid = t.customerid) • Use LEFT OUTER JOIN  SELECT t.* FROM transaction t LEFT OUTER JOIN customer c ON t.customerid =c.customerid WHERE c.customerid IS NULL 9 Module 5: Advanced Analytics -Technology and Tools Correlated sub queries are supported in a Greenplum environment 4.2 onwards. The first example shown in the slide is a correlated sub query in which we have a nested correlated sub query using the first variable from the first SELECT statement (on table “transaction” ) used for selection with table “Customer” in the query nested with a WHERE clause. In the context of MPP architecture of Greenplum supporting correlated sub queries are not efficient.

The code example shown at the bottom accomplishes the same with a LEFT OUTER JOIN. It is recommended that the multidimensional queries that are traditionally done with sub queries and correlated sub queries be optimally coded with the proper use of the MPP architecture of Greenplum. 9 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Sub -query vs. Inner Join • IN clause is fully supported …  SELECT * FROM transaction t WHERE t.customerid IN (SELECT customerid FROM customer) • However, generally better idea as long as c.customerid is unique:  SELECT t.* FROM transaction t INNER JOIN customer c ON c.customerid = t.customerid 10 Module 5: Advanced Analytics -Technology and Tools Sub queries and nested queries are commonly used for multi -dimensional queries in which we use IN clause with another SELECT statement. Sub -queries such as the one shown above are supported in a Greenplum environment.

They are supported and valid but it is generally a good idea to use a INNER JOIN to accomplish the same result. The performance query with INNER JOIN is far superior to the nested query in the first example. 10 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Greenplum SQL OLAP Grouping Extensions Greenplum supports the following grouping extensions:

• Standard GROUP BY • ROLLUP • GROUPING SETS • CUBE • grouping(column [, ...]) function • group_id() function 11 Module 5: Advanced Analytics -Technology and Tools Greenplum SQL OLAP Grouping Extensions Greenplum introduced support for extensions to the standard GROUP BY clause, which is fully supported. These clauses can simplify the expression of complex groupings: • ROLLUP –This extension provides hierarchical grouping. • CUBE –Complete cross -tabular grouping, or all possible grouping combinations, is provided with this extension. • GROUPING SETS –Generalized grouping is provided with the GROUPING SETS clause. • grouping function –This clause helps identify super -aggregated rows from regular grouped rows. • group_id function –This clause is used to identify duplicate rows in grouped output. 11 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Standard GROUP BY Example GROUP BY : • Group results based on one or more specified columns • Is used with aggregate statements The following example summarizes product sales by vendor:

SELECT pn, vn, sum(prc*qty) FROM sale GROUP BY pn, vn ORDER BY 1,2,3; pn | vn | sum ----- +---- +------- 100 | 20 | 0 100 | 40 | 2640000 200 | 10 | 0 200 | 40 | 0 300 | 30 | 0 400 | 50 | 0 500 | 30 | 120 600 | 30 | 60 700 | 40 | 1 800 | 40 | 1 (10 rows) 12 Module 5: Advanced Analytics -Technology and Tools Standard GROUP BY Example The standard GROUP BY clause groups results based on one or more columns specified. It is used in conjunction with aggregate statements, such as SUM , MIN , or MAX . This helps to make the resulting data set more readable.

The slide shows an example of a standard GROUP BY clause used to summarize product sales by vendor. 12 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Standard GROUP BY Example with UNION ALL This example extends the previous example by adding sub -totals and a grand total : SELECT pn, vn, sum(prc*qty) FROM sale GROUP BY pn, vn UNION ALL SELECT pn, null, sum(prc*qty) FROM sale GROUP BY pn UNION ALL SELECT null, null, sum(prc*qty) FROM SALE ORDER BY 1,2,3; pn | vn | sum ----- +---- +--------- 100 | 20 | 0 100 | 40 | 2640000 100 | | 2640000 200 | 10 | 0 200 | 40 | 0 200 | | 0 300 | 30 | 0 300 | | 0 400 | 50 | 0 400 | | 0 500 | 30 | 120 500 | | 120 600 | 30 | 60 600 | | 60 700 | 40 | 1 700 | | 1 800 | 40 | 1 800 | | 1 | | 2640182 (19 rows) 13 Module 5: Advanced Analytics -Technology and Tools Standard GROUP BY Example with UNION ALL In this follow -on example, the requirements for the query have been extended to include sub - totals and a grand total. You would need to use a UNION ALL to continue the grouping and provide for the additional requirements. 13 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

ROLLUP Example The following example meets the requirement where the sub -total and grand totals are to be included:

SELECT pn, vn, sum(prc*qty) FROM sale GROUP BY ROLLUP(pn, vn) ORDER BY 1,2,3; pn | vn | sum ----- +---- +--------- 100 | 20 | 0 100 | 40 | 2640000 100 | | 2640000 200 | 10 | 0 200 | 40 | 0 200 | | 0 300 | 30 | 0 300 | | 0 400 | 50 | 0 400 | | 0 500 | 30 | 120 500 | | 120 600 | 30 | 60 600 | | 60 700 | 40 | 1 700 | | 1 800 | 40 | 1 800 | | 1 | | 2640182 (19 rows) 14 Module 5: Advanced Analytics -Technology and Tools ROLLUP Example This slide meets the requirements provided in the previous slide, but uses the ROLLUP grouping extension. ROLLUP allows you to perform hierarchical grouping and helps to reduce the code. 14 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

GROUPING SETS Example The following example shows how to achieve the same results with the GROUPING SETS clause: SELECT pn, vn, sum(prc*qty) FROM sale GROUP BY GROUPING SETS ( (pn, vn), (pn), () ) ORDER BY 1,2,3; pn | vn | sum ----- +---- +--------- 100 | 20 | 0 100 | 40 | 2640000 100 | | 2640000 200 | 10 | 0 200 | 40 | 0 200 | | 0 300 | 30 | 0 300 | | 0 400 | 50 | 0 400 | | 0 500 | 30 | 120 500 | | 120 600 | 30 | 60 600 | | 60 700 | 40 | 1 700 | | 1 800 | 40 | 1 800 | | 1 | | 2640182 (19 rows) Summarize sales by product & vendor Subtotals for each product Grand total 15 Module 5: Advanced Analytics -Technology and Tools GROUPING SETS Example The GROUPING SETS extension allows you to specify grouping sets. If you use the GROUPING SETS clause to meet the earlier requirements so that it produced the same output as ROLLUP , it would use the following groups : • (pn,vn) –This grouping summarizes product sales by vendor. • (pn) –This grouping provides subtotal sales for each product. • () –This grouping provides the grand total for all sales for all vendors and products. 15 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

CUBE Example CUBE creates subtotals for all possible combinations of grouping columns.

The following example is the same as SELECT pn, vn, sum(prc*qty) FROM sale GROUP BY CUBE(pn, vn) ORDER BY 1,2,3; SELECT pn, vn, sum(prc*qty) FROM sale GROUP BY GROUPING SETS ( (pn, vn), (pn), (vn), () ) ORDER BY 1,2,3; pn | vn | sum----- +---- +--------- 100 | 20 | 0100 | 40 | 2640000100 | | 2640000200 | 10 | 0200 | 40 | 0200 | | 0300 | 30 | 0300 | | 0400 | 50 | 0400 | | 0500 | 30 | 120500 | | 120600 | 30 | 60600 | | 60700 | 40 | 1700 | | 1800 | 40 | 1800 | | 1 | 10 | 0| 20 | 0| 30 | 180| 40 | 2640002| 50 | 0| | 2640182 (24 rows) 16 Module 5: Advanced Analytics -Technology and Tools CUBE Example A CUBE grouping creates subtotals for all of the possible combinations of the given list of grouping columns, or expressions.

In terms of multidimensional analysis, CUBE generates all the subtotals that could be calculated for a data cube with the specified dimensions.

In the example shown on the slide, the additional grouping set of (vn) -subtotaling the sales by vendor, is included as part of the cube.

Note that nelements of a CUBE translate to 2 ngrouping sets. Consider using CUBE in any situation requiring cross -tabular reports . CUBE is typically most suitable in queries that use columns from multiple dimensions rather than columns representing different levels of a single dimension . For instance, a commonly requested cross -tabulation might need subtotals for all the combinations of month, state, and product. 16 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

GROUPING Function Example Grouping distinguishes NULL from summary markers. SELECT * FROM dsales_null; store | customer | product | price------- +---------- +--------- +------- s2 | c1 | p1 | 90s2 | c1 | p2 | 50s2 | | p1 | 44s1 | c2 | p2 | 70s1 | c3 | p1 | 40(5 rows) SELECT store,customer,product, sum(price), grouping(customer) FROM dsales_null GROUP BY ROLLUP(store,customer, product); store | customer | product | sum | grouping------- +---------- +--------- +---- +---------- s1 | c2 | p2 | 70 | 0s1 | c2 | | 70 | 0s1 | c3 | p1 | 40 | 0s1 | c3 | | 40 | 0s1 | | | 110 | 1s2 | c1 | p1 | 90 | 0s2 | c1 | p2 | 50 | 0s2 | c1 | | 140 | 0s2 | | p1 | 44 | 0s2 | | | 44 | 0s2 | | | 184 | 1 | | | 294 | 1 (12 rows) 17 Module 5: Advanced Analytics -Technology and Tools GROUPING Function Example When you use grouping extensions to calculate summary rows, such as sub -totals and grand totals, the output can become confusing if the data in a grouping column contains NULL values. It is hard to tell if a row is supposed to be a subtotal row or a regular row containing a NULL.

In the example shown on the slide, one of the rows shown where the customer field is NULL. Without the grouping id, you could misinterpret the sum of 44 as a subtotal row for store 2.

The GROUPING function returns a result for each output row, where: • 1 represents a summary row • 0 represents grouped rows 17 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

GROUP_ID Function GROUP_ID : • Returns 0 for each output row in a unique grouping set • Assigns a serial number >0 to each duplicate grouping set found • Can be used to filter output rows of duplicate grouping sets, such as in the following example:

SELECT a, b, c, sum(p*q), group_id() FROM sales GROUP BY ROLLUP(a,b), CUBE(b,c) HAVING group_id()<1 ORDER BY a,b,c; 18 Module 5: Advanced Analytics -Technology and Tools GROUP_ID Function Is useful when combining grouping extension clauses. In this example query, the combination of ROLLUP and CUBE produces: • 12 grouping sets • 8 DISTINCT grouping sets The group_id function can be used to filter out or identify duplicate grouping sets in the output. GROUP BY ROLLUP (a,b), CUBE (b,c) is the same as • GROUP BY GROUPING SETS ( (a,b), (a), () ), GROUPING SETS ( (b,c), (b), (c), () ) • GROUP BY GROUPING SETS ((a,b,b,c), (a,b,b), (a,b,c), (a,b), (a,b,c), (a,b), (a,c), (a), ( b,c ), (b), (c), () ) Where there are 12 total grouping sets but only 8 distinct grouping sets, where the groups are: • (a,b,b,c) = (a,b,c) = (a,b,c) • (a,b,b) = (a,b) = (a,b) • (a,c) • (b,c) • (a) • (b) • (c) • () 18 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

In -database Text Analysis • SQL features for  Text handling functions  Pattern matching with regular expressions • Example Use -cases  Filter emails with spam tag in subject  Extract domains from a URL  Extract all URLs from a HTML file  Check for Syntactically correct email addresses  Convert 10 digits into format “(123) 456 -7890” 19 Module 5: Advanced Analytics -Technology and Tools In Lesson 8 of Module 4 we described the techniques for text analysis. It is common practice to store the parsed data from an unstructured source in a database for down stream analysis.

With the advent of Hadoop and its ecosystem products, unstructured data is also stored in external tables and accessed by traditional relational databases. We present a brief introduction to text processing in -database. The topics covered in the next few slides are the in -database methods for handling text and the use of expressions in SQL.

We also briefly discussed Regular expressions in Module 4 lesson 8. A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set ). A string is said to match a regular expression if it is a member of the regular set described by the regular expression.

Regular expressions provide the means for matching strings of text and building the functionalities for string manipulation in SQL. The theoretical basis for regular expressions can be explained with Finite State Machines (out of scope for this course). It should be noted that regular expression cannot match subscripts and superscripts or well formed nested parentheses.

We will illustrate some of the SQL syntax with regular expressions. 19 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Pattern Matching -Regular Expressions (Regex) Regular Expression match Operators SQL Functions substring( string, from, pattern [for escape ]) regexp_matches( string , pattern , [flags ]) regexp_replace( string , pattern , repl , [flags ]) regexp_split_to_{array|table} Operator Description Example ~ Case sensitive substring ‘Greenplum’ ~ ‘^Green’ ~* Case -insensitive substring ‘Greenplum’ ~*’ee+’ 20 Module 5: Advanced Analytics -Technology and Tools The “substring” function is primarily used for string pattern matching. The operator ~ is specified for case sensitive match of the substring and ~* for case insensitive match.

In the first example we are trying to find records with substring “Green” (case sensitive) starting at the beginning specified with character ^ In the second example we are finding match for a pattern “ee” (case in -sensitive) as a preceding term one or more times (Specified with a +) Refer to http://www.postgresql.org/docs/8.3/static/functions -matching.html for details of the syntax. 20 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Regular Expression Quantifiers • Quantifier Expression Matches . Arbitrary character ^ And $ Virtual characters for beginning and end * Preceding item zero or more times + Preceding item one or more times ? Preceding item is optional {n} Preceding item ntimes a|b Item aor b … … 21 Module 5: Advanced Analytics -Technology and Tools Quantifiers specify how often the preceding Regular Expression should match.

•Try to match the preceding regular expression zero or more times. • Example: "(ab)c*" matches "ab" followed by zero or more "c"s, i.e., "ab", "abc", "abcc", "abccc" ... +Try to match the preceding regular expression one or more times.

• Example: "(ab)c+" matches "ab" followed by one or more "c"s, i.e., "abc", "abcc", "abccc" ... Examples:

1) All mail with at least two + in x_spam_level : SELECT * FROM mail WHERE x_spam_level ~ ' \\+\\+*‘ 2) All top -level domains of sender’s addresses: SELECT substring("from" FROM '\\.[[: alnum :]]+$') FROM mail 3) Remove [Spam] tag at beginning of subjects: SELECT regex_replace (subject, '^((?:Re:[[:space:]]* )*)\\[Spam \\]' || '(.*)', ' \\1\\2') FROM mail 21 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Check Your Knowledge 1. How would you use GROUPING SETS to produce the same results as the following GROUP BY CUBE?

SELECT state, productID, SUM(volume) FROM sales GROUP BY CUBE (state, productID) ORDER BY state, productID 2. How would you show the sub -totals for each week, for each state, and for each product? (No other totals or grand totals are required.) Suppose the table structure is TABLE sales (productID VARCHAR, state CHAR(2), week DATE, volume INT) 3. Discuss the utility of grouping and group_id functions Your Thoughts? 22 Module 5: Advanced Analytics -Technology and Tools Note your answer/references below: 1. SELECT state, productID , SUM(volume) FROM sales GROUP BY GROUPING SETS ((state, productID ), (state), ( productID ), ()) ORDER BY state, productID 2. SELECT state, productID , week, SUM(volume) FROM sales GROUP BY GROUPING SETS ((state), ( productID ), (week)) ORDER BY state, productID , week 22 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.

Check Your Knowledge (Continued) 4. Give regular expressions for the following:

 A regex that, given a URL, captures the domain name  A regex that captures PostgreSQL Dollar -quoted String literals Examples:

 $test$This is a string$test$ Your Thoughts? 23 Module 5: Advanced Analytics -Technology and Tools Note your answer/references below: SELECT substring(c9 from '(?:(?:(?:[^/]|/[^/])*//)|(?=^([^/]|/[^/]|/$)*$))([^/]*)') AS domain SELECT regexp_matches ( '$ test$This is a string$test $', E'( \\$[a -z]* \\$).*? \\1'); 23 Module 5: Advanced Analytics -Technology and Tools Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Advanced Analytics - Technology and Tools During this lesson the following SQL Essentials topics were covered:

• Online analytical processing (OLAP) features • GROUPING SETS, ROLLUP,CUBE • GROUPING, GROUP_ID functions • Text processing, Pattern matching Summary 24 Module 5: Advanced Analytics -Technology and Tools SQL essentials were covered in this lesson. 24 Module 5: Advanced Analytics -Technology and Tools