Define and execute a process to evaluate your data warehouse data for incompleteness, nulls, and the ability to provide consistent query data.Create a summary for your project sponsors to inform them
Introduction
There are certain discrepancies that may arise such as duplication of data. We will identify each table with a primary key which ought to be unique. A primary key is the candidate key that is selected to identify things uniquely within the relation. We will also employ the use of foreign keys. A foreign key is an attribute within one relation that matches the primary key of some relation. In some instances, it could be the same relation.
We will also use foreign key with cascade delete to enforce data quality. This means that, if a record in the parent table is deleted then the corresponding record in the child table should also be deleted automatically.
We will also use data types whereby a user cannot enter something that is contrary to the data type. For example, the attribute AmountOfLoans from table FFEL ought to be a currency, so we need to use a data type that supports currency such as DECIMAL(15,2).
DLGradPlus
PK DLGradPlusId
Recepients
NoOfLoans
AmountOfLoans
NoOfDisbursement
AmountOfDisbursement
FK DLId
DLParentPlus
PK DLParentPlusId
Recepients
NoOfLoans
AmountOfLoans
NoOfDisbursement
AmountOfDisbursement
FK DLId
FFEL
PK FFELId
Recepients
NoOfLoans
AmountOfLoans
NoOfDisbursement
FFELLoanType
FK LoandId
DirectLoanUnsubsidized
PK DLUnsubsidizedId
Recepients
NoOfLoans
AmountOfLoans
NoOfDisbursement
AmountOfDisbursement
FK DLId
DirectLoanSubsidized
PK DLSubsidizedId
Recepients
NoOfLoans
AmountOfLoans
NoOfDisbursement
AmountOfDisbursement
FK DLId
Grants
PK GrantId
GrantProgram
FK SchoolCode
FederalPellGrant
PK FederalPellfrantId
SumOfRecepients
SumOfDisbursement
FK GrantId
TeachGrant
PK TeachGrantId
SumOfRecipents
SumOfDisbursements
FK GrantId
IraqAfghanGrant
PK IraqAfghanGrantId
SumOfRecepients
SumOfDisbursements
FK GrantId
GainfulEmployment
PK GEId
CIPCode
CIPName
CredentialLevel
RepaymentRate
RepaymentNumerator
RepaymentDenominator
DirectLoans
PK DLId
LoanType
FK LoanId
Loans
PK LoanId
LoanType
FK SchoolCode
School
PK SchoolCode
SchoolName
Address
City
State
Zip
Province
Country
PostCode
SchoolType
Physical database
School (SchoolCode PK, SchoolName,Address, City, State, Zip, Province, Country, PostCode, SchoolType)
Loans (LoanId PK, LoanType, SchoolCode FK)
DirectLoans (DLId PK, LoanType, LoanId FK)
DirectLoanSubsidized (DLSubsidizedId PK, Recipients, NoOfLoans, AmountOfLoans, NoOfDisbursement, AmountOfDisbursement, DLId FK)
GainfulEmployement (GEId PK, CIPCode, CIPName, CredentialLevel, RepaymentRate, RepaymentNumerator, RepaymentDenominator, SchoolCode FK)
DirectLoanUnsubsidized (DLUnsubsidizedId PK, Recipients, NoOfLoans, AmountOfLoans, NoOfDisbursement, AmountOfDisbursement, DLId FK)
DLParentPlus (DLParentPlusId PK, Recipients, NoOfLoans, AmountOfLoans, NoOfDisbursement, DLId FK)
DLGradPlus (DLGradPlusId PK, Recipients, NoOfLoans, AmountOfLoans, NoOfDisbursement, AmountOfDisbursement, DLId FK)
Grants (GrantId PK, GrantProgram, SchoolCode FK)
FFEL (FFELId PK, Recipients, NoOfLoans, AmountOfLoans, NoOfDisbursement, FFELLoanType, LoanId FK)
FederalPellGrant (FederalPellGrantId PK, SumOfRecipients, SumOfDisbursement, GrantId FK)
TeachGrant (TeachGrantId PK, SumOfRecipients, SumOfDisbursements, GrantId FK)
IraqAfghanGrant (IraqAfghanGrantId PK, SumOfRecipients, SumOfDisbursements, GrantId FK)
The bold attributes with PK means it is the Primary Key and the attributes with FK means it is the Foreign Key attribute.
SQL Queries
1. Creating the School table
CREATE TABLE `School` (
`SchoolCode` VARCHAR (10) NOT NULL,
`SchoolName` VARCHAR (255) NOT NULL,
`Address` VARCHAR (255) NOT NULL,
`City` VARCHAR (255) NOT NULL,
`State` VARCHAR (5) NOT NULL,
`Zip` VARCHAR (10) NOT NULL,
`Province` VARCHAR (50) NOT NULL,
`Country` VARCHAR (100) NOT NULL,
`PostCode` VARCHAR (20) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
2. Showing School table data
SELECT * FROM School;
3. Which schools generate the highest amount of debt for the average student who attends?
SELECT s.SchoolName, MAX(l.NoOfDisbursements) AS Debt FROM school s, DirectLoanUnsubsidized l GROUP BY s.SchoolName;
4. Employment rate for the students who graduate from these schools?
SELECT s.SchoolName, AVG(g.RepaymentRate) AS EmploymentRate FROM School s, GainfulEmployment g WHERE s.SchoolCode = g.SchoolCode GROUP BY s.SchoolName;
Best Practices
I have used primary keys to ensure that there are no duplicate records. In this project, I have also used foreign keys so as to ensure relationship among the tables. The use of aliases in creating calculated fields has been performed. I have ensured that you cannot put an empty data by using the SQL NOT NULL key word in attributes.
Before we look at the integration plans, let us first define data warehouse and look at the process of transforming our data schema to a data warehouse schema.
Data warehouse
A data warehouse is a subject-oriented, integrate, time-variant, and nonvolatile collection of data in support of management’s decision-making process. Data warehouse is seen as the storage of huge volumes of data that supports the organization in decision making and complex analysis of the data.
Star schema
We will use the star schema to transform our data schema to a data warehouse schema. A star schema is a data model that has a fact table in the center, surrounded by denormalized tables. From our data warehouse schema we can visualize that we have one fact table known as “USAidFact_Table” and 13 dimension tables. The dimension tables include: “School”, “Loans”, “DirectLoans”, “DirectLoanSubsidized”, “FederalPellGrant”, “TeachGrant”, “IraqAfghanGrant”, “FFEL”, “Grants”, “DLGradPlus”, “DLParentPlus”, “DirectLoanUnsubsidized”, “GainfulEmployment”.
Integration plans
In order for us to integrate our data warehouse with the internet application, we need to incorporate certain aspects such as:
Data warehouse database management system
Data warehouse metadata
Business intelligence tools such as OLAP and Data mining
Network infrastructure
Data warehouse database management
We need to integrate a data warehouse DBMS in our data warehouse. The DBMS will support load performance, load processing, scalability when data volume increases and data quality management.
Data warehouse metadata
We will have to integrate a data warehouse metadata. The metadata shows the history of any item searched since it gives the pathway back to where the data began. We also need to integrate data mart.
OLAP and Data mining
We will integrate OLAP and Data Mining in our data warehouse. Online analytical processing (OLAP) is the dynamic analysis and consolidation of large volumes of data. Data mining is the process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make business decisions. We will use OLAP and data mining for analysis of the data and use it for decision making within the organization.
Network infrastructure
The data warehouse cannot run without the network infrastructure. Therefore, we need to have these devices such as data mart etc. interconnected to one another so that they can communicate to each other.
USAidFact_Table
SchoolId
LoanId
DLId
DLSubsidizedId
GEId
DLUnsubsidizedId
DLParentPlusId
GradPlusId
GrantId
FFELId
IraqAfghanGrantId
TeachGrantId
FederalPellGrantId
Grants
GrantId {PK}
GrantProgram
FFEL
FFELId {PK}
Recepients
NoOfLoans
AmountOfLoans
NoOfDisbursement
FFELLoanType
Loans
LoanId {PK}
LoanType
DirectLoans
DLId {PK}
LoanType
DirectLoanSubsidized
DLSubsidizedId {PK}
Recepients
NoOfLoans
AmountOfLoans
NoOfDisbursement
AmountOfDisbursement
IraqAfghanGrant
IraqAfghanGrantId {PK}
SumOfRecepients
SumOfDisbursement
TeachGrant
TeachGrantId {PK}
SumOfRecepients
SumOfDisbursement
FederalPellGrant
FederalPellGrantId {PK}
SumOfRecepients
SumOfDisbursement
DLGradPlus
DLGradPlusId {PK}
Recepients
NoOfLoans
AmountOfLoans
NoOfDisbursement
AmountOfDisbursement
DLParentPlus
DLParentPlusId {PK}
Recepients
NoOfLoans
AmountOfLoans
NoOfDisbursement
AmountOfDisbursement
DirectLoanUnsubsidized
DLUnsubsidizedId {PK}
Recepients
NoOfLoans
AmountOfLoans
NoOfDisbursement
GainfulEmployment
GEId {PK}
CIPCode
CIPName
CredentialLevel
RepaymentRate
RepaymentNumerator
RepaymentDenominator
School
SchoolId {PK}
SchoolCode
Address
City
State
Zip
Province
Country
PostCode
SchoolType