(PART 1)Use what you learned in the Template to create a final summary for your project sponsors.Revise your process and approach, as appropriate.Include the following in your summary:The consideratio

Introduction

There are certain discrepancies that may arise such as duplication of data. We will identify each table with a primary key which ought to be unique. A primary key is the candidate key that is selected to identify things uniquely within the relation. We will also employ the use of foreign keys. A foreign key is an attribute within one relation that matches the primary key of some relation. In some instances, it could be the same relation.

We will also use foreign key with cascade delete to enforce data quality. This means that, if a record in the parent table is deleted then the corresponding record in the child table should also be deleted automatically.

We will also use data types whereby a user cannot enter something that is contrary to the data type. For example, the attribute AmountOfLoans from table FFEL ought to be a currency, so we need to use a data type that supports currency such as DECIMAL(15,2).

DLGradPlus

PK DLGradPlusId

Recepients

NoOfLoans

AmountOfLoans

NoOfDisbursement

AmountOfDisbursement

FK DLId

DLParentPlus

PK DLParentPlusId

Recepients

NoOfLoans

AmountOfLoans

NoOfDisbursement

AmountOfDisbursement

FK DLId

FFEL

PK FFELId

Recepients

NoOfLoans

AmountOfLoans

NoOfDisbursement

FFELLoanType

FK LoandId

DirectLoanUnsubsidized

PK DLUnsubsidizedId

Recepients

NoOfLoans

AmountOfLoans

NoOfDisbursement

AmountOfDisbursement

FK DLId

DirectLoanSubsidized

PK DLSubsidizedId

Recepients

NoOfLoans

AmountOfLoans

NoOfDisbursement

AmountOfDisbursement

FK DLId

Grants

PK GrantId

GrantProgram

FK SchoolCode

FederalPellGrant

PK FederalPellfrantId

SumOfRecepients

SumOfDisbursement

FK GrantId

TeachGrant

PK TeachGrantId

SumOfRecipents

SumOfDisbursements

FK GrantId

IraqAfghanGrant

PK IraqAfghanGrantId

SumOfRecepients

SumOfDisbursements

FK GrantId

GainfulEmployment

PK GEId

CIPCode

CIPName

CredentialLevel

RepaymentRate

RepaymentNumerator

RepaymentDenominator

DirectLoans

PK DLId

LoanType

FK LoanId

Loans

PK LoanId

LoanType

FK SchoolCode

School

PK SchoolCode

SchoolName

Address

City

State

Zip

Province

Country

PostCode

SchoolType


Physical database

School (SchoolCode PK, SchoolName,Address, City, State, Zip, Province, Country, PostCode, SchoolType)

Loans (LoanId PK, LoanType, SchoolCode FK)

DirectLoans (DLId PK, LoanType, LoanId FK)

DirectLoanSubsidized (DLSubsidizedId PK, Recipients, NoOfLoans, AmountOfLoans, NoOfDisbursement, AmountOfDisbursement, DLId FK)

GainfulEmployement (GEId PK, CIPCode, CIPName, CredentialLevel, RepaymentRate, RepaymentNumerator, RepaymentDenominator, SchoolCode FK)

DirectLoanUnsubsidized (DLUnsubsidizedId PK, Recipients, NoOfLoans, AmountOfLoans, NoOfDisbursement, AmountOfDisbursement, DLId FK)

DLParentPlus (DLParentPlusId PK, Recipients, NoOfLoans, AmountOfLoans, NoOfDisbursement, DLId FK)

DLGradPlus (DLGradPlusId PK, Recipients, NoOfLoans, AmountOfLoans, NoOfDisbursement, AmountOfDisbursement, DLId FK)

Grants (GrantId PK, GrantProgram, SchoolCode FK)

FFEL (FFELId PK, Recipients, NoOfLoans, AmountOfLoans, NoOfDisbursement, FFELLoanType, LoanId FK)

FederalPellGrant (FederalPellGrantId PK, SumOfRecipients, SumOfDisbursement, GrantId FK)

TeachGrant (TeachGrantId PK, SumOfRecipients, SumOfDisbursements, GrantId FK)

IraqAfghanGrant (IraqAfghanGrantId PK, SumOfRecipients, SumOfDisbursements, GrantId FK)

The bold attributes with PK means it is the Primary Key and the attributes with FK means it is the Foreign Key attribute.

SQL Queries

1. Creating the School table

CREATE TABLE `School` (

`SchoolCode` VARCHAR (10) NOT NULL,

`SchoolName` VARCHAR (255) NOT NULL,

`Address` VARCHAR (255) NOT NULL,

`City` VARCHAR (255) NOT NULL,

`State` VARCHAR (5) NOT NULL,

`Zip` VARCHAR (10) NOT NULL,

`Province` VARCHAR (50) NOT NULL,

`Country` VARCHAR (100) NOT NULL,

`PostCode` VARCHAR (20) NOT NULL

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

2. Showing School table data

SELECT * FROM School;

3. Which schools generate the highest amount of debt for the average student who attends?

SELECT s.SchoolName, MAX(l.NoOfDisbursements) AS Debt FROM school s, DirectLoanUnsubsidized l GROUP BY s.SchoolName;

4. Employment rate for the students who graduate from these schools?

SELECT s.SchoolName, AVG(g.RepaymentRate) AS EmploymentRate FROM School s, GainfulEmployment g WHERE s.SchoolCode = g.SchoolCode GROUP BY s.SchoolName;

Best Practices

I have used primary keys to ensure that there are no duplicate records. In this project, I have also used foreign keys so as to ensure relationship among the tables. The use of aliases in creating calculated fields has been performed. I have ensured that you cannot put an empty data by using the SQL NOT NULL key word in attributes.

Before we look at the integration plans, let us first define data warehouse and look at the process of transforming our data schema to a data warehouse schema.

Data warehouse

A data warehouse is a subject-oriented, integrate, time-variant, and nonvolatile collection of data in support of management’s decision-making process. Data warehouse is seen as the storage of huge volumes of data that supports the organization in decision making and complex analysis of the data.

Star schema

We will use the star schema to transform our data schema to a data warehouse schema. A star schema is a data model that has a fact table in the center, surrounded by denormalized tables. From our data warehouse schema we can visualize that we have one fact table known as “USAidFact_Table” and 13 dimension tables. The dimension tables include: “School”, “Loans”, “DirectLoans”, “DirectLoanSubsidized”, “FederalPellGrant”, “TeachGrant”, “IraqAfghanGrant”, “FFEL”, “Grants”, “DLGradPlus”, “DLParentPlus”, “DirectLoanUnsubsidized”, “GainfulEmployment”.

Integration plans

In order for us to integrate our data warehouse with the internet application, we need to incorporate certain aspects such as:

Data warehouse database management system

Data warehouse metadata

Business intelligence tools such as OLAP and Data mining

Network infrastructure

Data warehouse database management

We need to integrate a data warehouse DBMS in our data warehouse. The DBMS will support load performance, load processing, scalability when data volume increases and data quality management.

Data warehouse metadata

We will have to integrate a data warehouse metadata. The metadata shows the history of any item searched since it gives the pathway back to where the data began. We also need to integrate data mart.

OLAP and Data mining

We will integrate OLAP and Data Mining in our data warehouse. Online analytical processing (OLAP) is the dynamic analysis and consolidation of large volumes of data. Data mining is the process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make business decisions. We will use OLAP and data mining for analysis of the data and use it for decision making within the organization.

Network infrastructure

The data warehouse cannot run without the network infrastructure. Therefore, we need to have these devices such as data mart etc. interconnected to one another so that they can communicate to each other.

USAidFact_Table

SchoolId

LoanId

DLId

DLSubsidizedId

GEId

DLUnsubsidizedId

DLParentPlusId

GradPlusId

GrantId

FFELId

IraqAfghanGrantId

TeachGrantId

FederalPellGrantId

Grants

GrantId {PK}

GrantProgram

FFEL

FFELId {PK}

Recepients

NoOfLoans

AmountOfLoans

NoOfDisbursement

FFELLoanType

Loans

LoanId {PK}

LoanType

DirectLoans

DLId {PK}

LoanType

DirectLoanSubsidized

DLSubsidizedId {PK}

Recepients

NoOfLoans

AmountOfLoans

NoOfDisbursement

AmountOfDisbursement

IraqAfghanGrant

IraqAfghanGrantId {PK}

SumOfRecepients

SumOfDisbursement

TeachGrant

TeachGrantId {PK}

SumOfRecepients

SumOfDisbursement

FederalPellGrant

FederalPellGrantId {PK}

SumOfRecepients

SumOfDisbursement

DLGradPlus

DLGradPlusId {PK}

Recepients

NoOfLoans

AmountOfLoans

NoOfDisbursement

AmountOfDisbursement

DLParentPlus

DLParentPlusId {PK}

Recepients

NoOfLoans

AmountOfLoans

NoOfDisbursement

AmountOfDisbursement

DirectLoanUnsubsidized

DLUnsubsidizedId {PK}

Recepients

NoOfLoans

AmountOfLoans

NoOfDisbursement

GainfulEmployment

GEId {PK}

CIPCode

CIPName

CredentialLevel

RepaymentRate

RepaymentNumerator

RepaymentDenominator

School

SchoolId {PK}

SchoolCode

Address

City

State

Zip

Province

Country

PostCode

SchoolType


This paper looks at the process of evaluation of a data warehouse for data incompleteness, nulls and the ability to provide consistent query data. It will also look at the data evaluation strategy, data evaluation queries and results per query. This paper will finish by looking at the summary of findings.

Evaluation strategies

Before looking at the evaluation strategies of a data warehouse, let us first define data warehouse. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data. A data warehouse can be described in the following areas: capacity, loading and indexing performance, operational integrity, reliability and manageability, client/ server connectivity and query processing performance.

Capacity

Capacity is all about space. Since we will be saving huge volumes of data, we need to have a memory that can support that. We intend to have a memory capacity of half a petabyte.

Loading and indexing performance

We will evaluate our data warehouse to see the loading and indexing performance. A data warehouse deals with frequent loading of data within smaller than usual time windows. The loading of data needs to have a high performance this way it doesn’t disrupt the system. We will also evaluate the indexing performance of our data warehouse and this deals with how well the data is indexed and how quickly record is found.

Operational integrity, reliability, and manageability

This section deals with the availability of the data. The data needs to be available all the time (24 * 7). We will also evaluate whether the data is reliable. Reliability is the transaction atomicity and durability. A transaction should either commit or rollback in case of a failure. The data warehouse should also recover from failure. The reliability, operational integrity and recovering from failure is provided by the database management system. The database management system also ensures that you cannot put null values and provides format of data e.g. the date data type has the format YYYY-MM-DD therefore ensuring that you cannot put incomplete data.

Client / server connectivity

This section deals with the evaluation of the network connectivity and speed since our data will be accessed over the network.

Query processing performance

This section looks at how the data warehouse DBMS handles complex queries. We will evaluate our data warehouse and see how fast it returns results of complex queries.

SQL Queries

SELECT schoolid, institutionname, SUM (loans) FROM school, gainfulemployment WHERE school.schoolid = gainfulemployment.schoolid GROUP BY ROLLUP(schoolid, institutionname);

(PART 1)Use what you learned in the Template to create a final summary for your project sponsors.Revise your process and approach, as appropriate.Include the following in your summary:The consideratio 1

Summary of findings

These are the findings that we came up with after evaluating our data warehouse:

We need to increase the storage capacity of our data warehouse to half a petabyte to facilitate increase in volume of data.

We need to install a high-speed network that will facilitate fast client / server communication.

Sometimes the data is not available due to slow network.

The data warehouse database management system is responsible for other things such as query processing, reliability, recovery and security of the data warehouse.