Ira Sharenow's Data Analytics

SQL

SQL and California Academic Performance Index (API)

 I have been using R to analyze the California Academic Performance Index. Please see my Projects page. Now I demonstrate how to use SQL. All code was tested on SQL Server 2012.

 I am using the 2002 and 2009 data for elementary, middle, and high schools. There are over 6,000 rows in each database.

 Each ‘ALL’ table contains all the data for the indicated year. However, I broke up the data into a schools table and a scores table. The common column is the school ID column, CDS.

Column Meaning
CDS School ID
STYPE School Type
SNAME School Name
DNAME District Name
CNAME County Name
VALID09 Number Valid Scores
API09 API Score, outcome variable
AVGED09 Average Parental Education
MEALS09 Perc. Stud. on Subsidized Meals Plan
EL09 Perc. Stud. English Learners
SCI09 A Predictor Used by CA Dept of Ed

 I have uploaded the comma separated files.

APIBaseAll2002.csv
0.5 MB

APISchools2002.csv
448.0 KB

APIScores2002.csv
248.4 KB

APIBaseAll2009.csv
0.6 MB

APISchools2009.csv
0.4 MB

APIScores2009.csv
288.1 KB

 

For each query (or group of similar queries) I indicate the objective and then show the code and the output. In some case, such as when I do a SELECT * and have a large amount of output, the output is truncated.

As I live in the East Bay, I sometimes limited output to some Bay Area Counties or make comparisons amongst those counties.

The project start date is March 21, 2014. The project will be updated several times.

Version 1: March 21, 2014

Current version: May 10, 2014

Table of Contents 

-- Query 1
-- OBJECTIVE: Basic information -- view all data and count the number of rows in each table
-- TECHNIQUES: summary function; IS NULL


-- Query 1A
-- OBJECTIVE: Create views that recombine the data
-- TECHNIQUES: CREATE VIEW, FULL OUTER JOIN

-- Query 2
-- OBJECTIVE: API scores for school appearing in tables for both years
-- TECHNIQUES: INNER JOIN, ORDER BY, utilize a view

-- Query 3
-- OBJECTIVE: API scores for school appearing in tables for both years and also 2002 data even if no match
-- TECHNIQUES: LEFT OUTER JOIN, multi-join


 -- Query 4
-- OBJECTIVE: API scores for school appearing in tables for both years and also for either year even if no match
-- TECHNIQUES: FULL OUTER JOIN, multi-join

-- Query 5
-- Objective: Determine which school names are in the 2002 database that are not in the 2009 database. Assume names did not change
-- TECHNIQUES: EXCEPT, IN, subquery


-- Query 6
-- Objective: Determine which schools  are in the 2002 database that are not in the 2009 database. Assume names did not change
-- TECHNIQUES: EXCEPT, subquery, CREATE VIEW


-- Query 7
-- Objective: Compute average score statewide
-- TECHNIQUES: AVG summary function


-- Query 8
-- Objective: Compute average score statewide by type of school
-- TECHNIQUES: AVG summary function, GROUP BY, join, subquery

-- Query 9
-- Objective: Create a view that will be useful for analyzing Bay Area data
-- TECHNIQUES: CREATE VIEW, join

-- Query 10
-- Objective: Summarize some Bay Area statistics
-- TECHNIQUES: Summary statistics, GROUP BY


-- Query 11
-- Objective: Obtain some data data relating to the Albany school district
-- TECHNIQUES: LIKE, IN, local variables, use a view

-- Query 12
-- Objective: Obtain data and summary statistics for Bay Area Schools
-- TECHNIQUES: subquery, NULL, GROUP BY, HAVING, common table expressions (CTE)

-- Query 13
-- Objective: Slice and dice the data
-- TECHNIQUES: GROUP BY

-- Query 14
-- Objective: Summary statistics using WITH CUBE
-- TECHNIQUE: WITH CUBE


-- Query 15
-- Objective: Create a cube table and then query the cube
-- TECHNIQUES: WITH CUBE
==========================
==========================

USE API2009DB;


-- Query 1 
-- OBJECTIVE: Basic information -- view all data and count the number of rows in each table
-- TECHNIQUES: summary function; IS NULL

-- 2002 data

SELECT *
FROM dbo.APIBaseAll2002;


--         CDS    STYPE    SNAME    DNAME    CNAME    VALID02    API02    AVGED02    MEALS02    EL02    SCI02
-- 1    41690706045157    E    Sunshine Gardens Elementary    South San Francisco Unified    San Mateo    227    687    2.83    42    20    165.283928
-- 2    41690706059976    M    Alta Loma Middle    South San Francisco Unified    San Mateo    714    696    3    19    11    163.492159
-- 3    41690706059984    M    Parkway Heights Middle    South San Francisco Unified    San Mateo    742    627    2.33    42    32    152.457148

SELECT *
FROM dbo.APISchools2002;


-- CDS    STYPE    SNAME    DNAME    CNAME    VALID02
-- 1611190130229    H    Alameda High    Alameda City Unified    Alameda    1166
-- 1611190132878    H    Encinal High    Alameda City Unified    Alameda    792
-- 1611196000004    M    Chipman Middle    Alameda City Unified    Alameda    532


SELECT *
FROM dbo.APIScores2002;


-- CDS    API02    AVGED02    MEALS02    EL02    SCI02
-- 1611190130229    733    3.55    13    14    169.261828
-- 1611190132878    606    2.94    34    22    152.229525
-- 1611196000004    634    2.92    55    24    154.525044


SELECT COUNT(*) AS Count2002
FROM dbo.APIBaseAll2002;


-- Count2002
-- 6173

SELECT COUNT(*) AS CountSchools2002
FROM dbo.APISchools2002;


-- CountSchools2002
-- 6173

SELECT COUNT(*) AS CountScores2002
FROM dbo.APIScores2002;


-- CountScores2002
-- 6173

-- 2009 data

SELECT COUNT(*) AS Count2009
FROM dbo.APISchools2009;


-- Count2009
-- 6962

-- 2002 data

SELECT COUNT(*) AS CountNull2002
FROM dbo.APISchools2002
WHERE SNAME IS NULL;


-- CountNull2002
-- 0

Back to Table of Contents
==========================
==========================
-- Query 1A 
-- OBJECTIVE: Create a view that recombines the 2002 data and a view that recombines the 2009 data
-- TECHNIQUES: CREATE VIEW, FULL OUTER JOIN


CREATE VIEW Vall2002 AS
SELECT sch02.CDS AS CDS, STYPE, SNAME, DNAME, CNAME, VALID02, API02, AVGED02, MEALS02, EL02, SCI02
FROM APISchools2002 AS sch02 , APIScores2002 AS sc02
WHERE sch02.CDS = sc02.CDS;

CREATE VIEW Vall2009 AS
SELECT sch09.CDS AS CDS, STYPE, SNAME, DNAME, CNAME, VALID09, API09, AVGED09, MEALS09, EL09, SCI09
FROM APISchools2009 AS sch09 , APIScores2009 AS sc09
WHERE sch09.CDS = sc09.CDS;

-- Now combine the two views

CREATE VIEW Vall0209 AS
SELECT V02.CDS AS CDS, V02.STYPE AS STYPE, V02.SNAME AS SNAME, V02.DNAME AS DNAME, V02.CNAME AS CNAME, VALID02, API02, AVGED02, MEALS02, EL02, SCI02, VALID09, API09, AVGED09, MEALS09, EL09, SCI09
FROM Vall2002 AS V02
FULL OUTER JOIN
Vall2009 AS V09
ON V02.CDS = V09.CDS;

Back to Table of Contents
==========================
==========================

-- Query 2 
-- OBJECTIVE: API scores for school appearing in tables for both years
-- TECHNIQUES: INNER JOIN, ORDER BY

SELECT sch02.SNAME AS SNAME, sc02.API02 AS API2002, sc09.API09 AS API2009
FROM APISchools2002 AS sch02, APISchools2009 AS sch09, APIScores2002 AS sc02, APIScores2009 AS sc09
WHERE sch02.CDS = sch09.CDS  AND sch02.CDS = sc02.CDS AND sch09.CDS = sc09.CDS
ORDER BY SNAME;

-- SNAME    API2002    API2009
-- "King (Martin Luther    702    726
-- "Olita Elementary (Lh    862    859
-- A. P. Giannini Middle    766    862


-- Method 2
-- Use a view
SELECT SNAME, API02, API09
FROM Vall0209
WHERE API02 IS NOT NULL and API09 IS NOT NULL
ORDER BY SNAME;


Back to Table of Contents
==========================
==========================

-- Query 3  
-- OBJECTIVE: API scores for school appearing in tables for both years and also 2002 data even if no match
-- TECHNIQUES: LEFT OUTER JOIN, multi-join

SELECT sch02.SNAME AS SNAME02, sch09.SNAME AS SNAME09, sc02.API02 AS API2002, sc09.API09 AS API2009
FROM APISchools2002 AS sch02
  LEFT OUTER JOIN APISchools2009 AS sch09
    ON sch02.CDS = sch09.CDS
  JOIN APIScores2002 AS sc02
    ON sch02.CDS = sc02.CDS
  LEFT JOIN APIScores2009 AS sc09
    ON sch09.CDS = sc09.CDS
ORDER BY SNAME02; 

-- SNAME02    SNAME09    API2002    API2009
-- "King (Martin Luther    Martin Luther King Jr. Elementary    702    726
-- "Olita Elementary (Lh    Olita Elementary    862    859
-- A Street Elementary    NULL    590    NULL

Back to Table of Contents
==========================
==========================

-- Query 4 

-- OBJECTIVE: API scores for school appearing in tables for both years and also for either year even if no match
-- TECHNIQUES: full outer join, multi-join

SELECT sch02.SNAME AS SNAME02, sch09.SNAME AS SNAME09, sc02.API02 AS API2002, sc09.API09 AS API2009
FROM APISchools2002 AS sch02
  FULL OUTER JOIN APISchools2009 AS sch09
    ON sch02.CDS = sch09.CDS
  FULL OUTER JOIN APIScores2002 AS sc02
    ON sch02.CDS = sc02.CDS
  FULL OUTER JOIN APIScores2009 AS sc09
    ON sch09.CDS = sc09.CDS
ORDER BY SNAME02; 


SNAME02    SNAME09    API2002    API2009
NULL    FAME Public Charter    NULL    743
NULL    Ruby Bridges Elementary    NULL    802
NULL    Alameda Community Learning Center    NULL    825

Back to Table of Contents
==========================
==========================
-- Query 5 
-- Objective: Determine which school names are in the 2002 database that are not in the 2009 database. Assume names did not change
-- TECHNIQUES: EXCEPT, IN, subquery

SELECT sch02.SNAME FROM APISchools2002 AS sch02
EXCEPT
SELECT sch09.SNAME FROM APISchools2009 AS sch09;


-- Method 2
SELECT DISTINCT sch02.SNAME
FROM APISchools2002 AS sch02
WHERE SNAME IN (SELECT sch02.SNAME FROM APISchools2002 AS sch02)
AND SNAME NOT IN (SELECT sch09.SNAME FROM APISchools2009 AS sch09);


-- SNAME
-- "King (Martin Luther
-- "Olita Elementary (Lh
-- A Street Elementary

Back to Table of Contents
==========================
==========================

-- Query 6 
-- Objective: Determine which schools  are in the 2002 database that are not in the 2009 database. Assume names did not change
-- TECHNIQUES: EXCEPT, subquery, CREATE VIEW

SELECT sch02.CDS AS CDS02, sch02.SNAME AS SNAME02
FROM APISchools2002 AS sch02
WHERE sch02.CDS IN
(SELECT sch02.CDS FROM APISchools2002 AS sch02
 EXCEPT
 SELECT sch09.CDS FROM APISchools2009 AS sch09)
ORDER BY SNAME02;


-- CDS02    SNAME02
-- 33671996113955    A Street Elementary
-- 36750773630837    Academy Academic Exc Char
-- 19753096022735    Acton Elementary

CREATE VIEW CDS02Not09
AS
SELECT sch02.CDS FROM APISchools2002 AS sch02
EXCEPT
SELECT sch09.CDS FROM APISchools2009 AS sch09;
GO

SELECT sch02.CDS AS CDS02, sch02.SNAME AS SNAME02
FROM APISchools2002 AS sch02, CDS02Not09 AS c
where sch02.CDS = c.CDS;


-- Method 3
SELECT sch02.CDS AS CDS02, sch02.SNAME AS SNAME02
FROM APISchools2002 AS sch02
WHERE sch02.CDS IN
   (SELECT sch02.CDS FROM APISchools2002 AS sch02
    EXCEPT
    SELECT sch09.CDS FROM APISchools2009 AS sch09);


Back to Table of Contents
==========================
==========================

-- Query 7 
-- Objective: Compute average score statewide
-- TECHNIQUES: AVG summary function

SELECT AVG(API02) AS avgAPI02
FROM APIScores2002 AS sc02;


avgAPI02
688.71

Back to Table of Contents

==========================
==========================

-- Query 8 
-- Objective: Compute average score statewide by type of school
-- TECHNIQUES: AVG summary function, GROUP BY, join, subquery

-- Step 1
SELECT API02, STYPE
FROM APISchools2002 AS sch02, APIScores2002 AS sc02
WHERE sch02.CDS = sc02.CDS

API02    STYPE
733    H
606    H
634    M

-- Solution
SELECT STYPE, AVG(API02) AS avgAPI02
FROM (SELECT API02, STYPE
      FROM APISchools2002 AS sch02, APIScores2002 AS sc02
      WHERE sch02.CDS = sc02.CDS) AS a
GROUP BY STYPE;


STYPE    avgAPI02
E    701.84
H    643.55
M    670.27


Back to Table of Contents
==========================
==========================

-- Query 9 
-- Objective: Create a view that will be useful for analyzing Bay Area data
-- TECHNIQUES: CREATE VIEW, join

CREATE VIEW BayArea
AS
SELECT sch02.CDS AS CDS, sch02.STYPE AS STYPE, sch02.SNAME AS SNAME, sch02.DNAME AS DNAME, sch02.CNAME AS CNAME,
       API02, API09, AVGED02, AVGED09, VALID02, VALID09
FROM APISchools2002 AS sch02
  JOIN APISchools2009 AS sch09
    ON sch02.CDS = sch09.CDS
  JOIN APIScores2002 AS sc02
    ON sch02.CDS = sc02.CDS
 JOIN APIScores2009 AS sc09
    ON sch09.CDS = sc09.CDS
WHERE sch02.CNAME IN ('Alameda', 'Contra Costa', 'Marin', 'San Francisco', 'San Mateo')


SELECT *
FROM BayArea
ORDER BY CNAME;


Back to Table of Contents
==========================
==========================
-- Query 10 
-- Objective: Summarize some Bay Area statistics
-- TECHNIQUES: Summary statistics, GROUP BY


SELECT CNAME, STYPE, AVG(API02) AS API2, AVG(API09) AS API9, AVG(AVGED02) AS AVGED2, AVG(AVGED09) AS AVG9
FROM BayArea
GROUP BY CNAME, STYPE
ORDER BY CNAME, STYPE;

Back to Table of Contents
==========================
==========================

-- Query 11 
-- Objective: Obtain some data data relating to the Albany school district
-- TECHNIQUES: LIKE, IN, local variables, use a view


-- Determine the AHS CDS
SELECT *
FROM Vall0209
WHERE SNAME LIKE '%Albany%';

-- Find differences for Albany schools
SELECT SNAME, API02, API09, API09 - API02 AS APIDiff
FROM Vall0209
WHERE CDS IN (1611270130450, 1611276090161);

-- Method 2
-- Use local variables

DECLARE @CDShs AS BIGINT = 1611270130450;
DECLARE @CDSms AS BIGINT = 1611276090161;

SELECT SNAME, API02, API09, API09 - API02 AS APIDiff
FROM Vall0209
WHERE CDS = @CDShs OR CDS = @CDSms;

Back to Table of Contents
==========================
==========================

-- Query 12 
-- Objective: Obtain data and summary statistics for Bay Area Schools and other groups
-- TECHNIQUES: subquery, NULL, GROUP BY, HAVING, common table expressions (CTE)

-- Find differences for all Bay Area schools
SELECT SNAME, API02, API09, API09 - API02 AS APIDiff
FROM Vall0209
WHERE CNAME IN (SELECT CNAME FROM BayArea)
  AND API02 IS NOT NULL AND API09 IS NOT NULL
ORDER BY APIDiff;


-- Find difference for Alameda County
SELECT CNAME, AVG(API02) AS mean02, AVG(API09) AS mean09, AVG(API09) - AVG(API02) AS APIDiff
FROM Vall0209
WHERE CNAME = 'Alameda'
GROUP BY CNAME;


-- Find difference for all counties in CA
SELECT CNAME, AVG(API02) AS mean02, AVG(API09) AS mean09, AVG(API09) - AVG(API02) AS APIDiff
FROM Vall0209
GROUP BY CNAME
ORDER BY APIDiff DESC;


-- Find difference for all counties in CA that improved by at least 100 points
SELECT CNAME, AVG(API02) AS mean02, AVG(API09) AS mean09, AVG(API09) - AVG(API02) AS APIDiff
FROM Vall0209
GROUP BY CNAME
HAVING AVG(API09) - AVG(API02) > 100
ORDER BY APIDiff DESC;


-- Find difference for all counties in Bay Area
-- Use BayArea view
SELECT CNAME, AVG(API02) AS mean02, AVG(API09) AS mean09, AVG(API09) - AVG(API02) AS APIDiff
FROM Vall0209
WHERE CNAME IN (SELECT CNAME FROM BayArea)
GROUP BY CNAME
ORDER BY APIDiff DESC;


-- Method 2
-- Use a common table expression
-- Underlying schools differ from those in previous queries
-- as the view was created using a full outer join

WITH BA1 AS
(
SELECT sch02.CDS AS CDS, sch02.STYPE AS STYPE, sch02.SNAME AS SNAME, sch02.DNAME AS DNAME, sch02.CNAME AS CNAME,
       API02, API09, AVGED02, AVGED09, VALID02, VALID09
FROM APISchools2002 AS sch02
  JOIN APISchools2009 AS sch09
    ON sch02.CDS = sch09.CDS
  JOIN APIScores2002 AS sc02
    ON sch02.CDS = sc02.CDS
 JOIN APIScores2009 AS sc09
    ON sch09.CDS = sc09.CDS
WHERE sch02.CNAME IN ('Alameda', 'Contra Costa', 'Marin', 'San Francisco', 'San Mateo')
)
SELECT CNAME, AVG(API02) AS mean02, AVG(API09) AS mean09, AVG(API09) - AVG(API02) AS APIDiff
FROM BA1
GROUP BY CNAME
ORDER BY APIDiff DESC;


Back to Table of Contents
==========================
==========================

-- Query 13 
-- Objective: Slice and dice the data
-- TECHNIQUES: GROUP BY

-- Group Bay Area data: CNAME, STYPE
SELECT CNAME, STYPE, AVG(API02) AS avgAPI02, AVG(API09) AS avgAPI09, AVG(AVGED02) AS avgAVGED02, AVG(AVGED09) AS avgAVGED09
FROM BayArea
GROUP BY CNAME, STYPE
ORDER BY CNAME;


-- Group Bay Area data: CNAME, DNAME, STYPE
SELECT CNAME, DNAME, STYPE, AVG(API02) AS avgAPI02, AVG(API09) AS avgAPI09, AVG(AVGED02) AS avgAVGED02, AVG(AVGED09) AS avgAVGED09
FROM BayArea
GROUP BY CNAME, DNAME, STYPE
ORDER BY CNAME, DNAME;


-- take a slice of only schools that had more than 500 valid scores in 2002
SELECT CNAME, DNAME, STYPE, AVG(API02) AS avgAPI02, AVG(API09) AS avgAPI09, AVG(AVGED02) AS avgAVGED02, AVG(AVGED09) AS avgAVGED09
FROM BayArea
WHERE VALID02 > 500
GROUP BY CNAME, DNAME, STYPE
ORDER BY CNAME, DNAME;


-- take a slice of only schools that had more than 500 valid scores in 2009
SELECT CNAME, DNAME, STYPE, AVG(API02) AS avgAPI02, AVG(API09) AS avgAPI09, AVG(AVGED02) AS avgAVGED02, AVG(AVGED09) AS avgAVGED09
FROM BayArea
WHERE VALID09 > 500
GROUP BY CNAME, DNAME, STYPE
ORDER BY CNAME, DNAME;


-- dice: only schools that had more than 500 valid scores each year
SELECT CNAME, DNAME, STYPE, AVG(API02) AS avgAPI02, AVG(API09) AS avgAPI09, AVG(AVGED02) AS avgAVGED02, AVG(AVGED09) AS avgAVGED09
FROM BayArea
WHERE VALID02 > 500 AND VALID09 > 500
GROUP BY CNAME, DNAME, STYPE
ORDER BY CNAME, DNAME;

-- dice: only schools that had less than or equal to 500 valid scores each year
SELECT CNAME, DNAME, STYPE, AVG(API02) AS avgAPI02, AVG(API09) AS avgAPI09, AVG(AVGED02) AS avgAVGED02, AVG(AVGED09) AS avgAVGED09
FROM BayArea
WHERE VALID02 <= 500 AND VALID09 <= 500
GROUP BY CNAME, DNAME, STYPE
ORDER BY CNAME, DNAME;

Back to Table of Contents
==========================
==========================

-- Query 14 
-- Objective: Summary statistics using WITH CUBE
-- TECHNIQUE: WITH CUBE

-- with cube
SELECT CNAME, DNAME, STYPE, AVG(API02) AS avgAPI02, AVG(API09) AS avgAPI09, AVG(AVGED02) AS avgAVGED02, AVG(AVGED09) AS avgAVGED09
FROM BayArea
GROUP BY CNAME, DNAME, STYPE WITH CUBE
ORDER BY CNAME, DNAME;


-- with cube
SELECT CNAME, DNAME, STYPE, AVG(API02) AS avgAPI02, AVG(API09) AS avgAPI09, AVG(AVGED02) AS avgAVGED02, AVG(AVGED09) AS avgAVGED09
FROM BayArea
WHERE VALID02 > 500
GROUP BY CNAME, DNAME, STYPE WITH CUBE
ORDER BY CNAME, DNAME;


Back to Table of Contents
==========================
==========================

-- Query 15 
-- Objective: Create a cube table and then query the cube
-- TECHNIQUES: WITH CUBE

-- Create a cube table
SELECT CNAME, DNAME, STYPE, AVG(API02) AS avgAPI02, AVG(API09) AS avgAPI09, AVG(AVGED02) AS avgAVGED02, AVG(AVGED09) AS avgAVGED09
INTO CubeAPI
FROM BayArea
GROUP BY CNAME, DNAME, STYPE WITH CUBE;


SELECT *
FROM CubeAPI;


-- Query the cube to get some summary statistics for Alameda County And Contra Costa County
SELECT CNAME, avgAPI02, avgAPI09
FROM CubeAPI
WHERE CNAME IN ('ALAMEDA', 'Contra Costa')  AND DNAME IS NULL AND STYPE IS NULL;


Back to Table of Contents


Website Builder