For the assessment of Foundation skills in data analytics, you need to investigate the dataset interpret the result and than draw the conclusion while analysis the technomart (fictitious company) for

MIS770 Foundational Skills in Data Analysis

Assignment 2 Trimester 3 2017

Report and Discussion

Analysis of House Prices in Kingfisher Bay

Fitria Mandaraira Nurman

Student ID: 217091233

  1. Introduction

The main purpose of this report is to respond to the queries about Kingfisher Bay's housing and rental data. The given data has been tested and analyzed to answer different questions regarding the claims of the house prices. In total, there are 120 houses randomly selected from 60,000 houses in population. There are also included some variables data both numerical and categorical as the factors that considered to influence the house prices. The information in regard to the house physical condition, weekly rate and distance to the particular place also provided in the database.

The data regarding the queries are analyzed by using excel as a key tool for various testing and presentation in the form of diagrams and tables. The report of the summary is structure based on the queries received. First, the report provides the summary of the house prices and its relation to the house condition and suburb. Second, the report elaborates the factors that influence the house price. Third, this report presents the analysis of the claims about the weekly rent and development in the city. At last, the estimation of the sample size also advised in the report.

Therefore, it is hoped that the analysis in this report could provide the answer regarding the concerned of the house prices in Kingfisher Bay. Further, it is expected that the observations and response to the queries are useful for the Kingfisher Bay council to make further action and policy regarding the claims.

  1. Discussion and Data Analysis

Review of house prices in Kingfisher Bay

The descriptive analysis of the house prices in Kingfisher Bay is described in the attached excel file, in which focusing on the central tendency and dispersion of the house prices data. Based on the data summary, the figure of data distribution is right-skewed where that the values concentrate on the low-end scale between the smallest amounts and Q3 (quartile three). The analysis indicates that only small portion of the total sample houses has very high prices in Kingfisher Bay. This assumption also supported by the fact that the average house prices ($866,580) are higher than the median ($852,000). Therefore, it will be more useful to focus on the median rather than the mean since the distribution of the data is skewed. Besides, by using the median as the measure, the house price does not change in the presence of the outlier values in which could present the house price for the entire population.

In addition, the descriptive analysis also includes both sample variance and standard deviation to identify the variation of the data set. From the calculation, the standard deviation of the house prices is $324.95 which signify that the majority of the data are in between $561.63 and $1211.52. The range is calculated in the deduction and addition of the standard deviation and mean of the data set.

In order to predict the average house prices for all houses in Kingfisher Bay, a confidence interval is used to determine the estimation by using excel as a key enabling tool. The result shows that there is a 95% of the house prices in the range between $880,650 and $892,500. Likewise, the estimation of the proportion of all houses with the price $1 million and more is obtained based on the confidence interval for the proportion. The result present that with a certainty of 95%, the proportion of the houses that are $1 million and more is between 27.25% up to 44.41% of the entire population. Detailed computation for the described result is presented in the excel file attached.

Relationship of house prices and condition/suburb

Referring to the histogram figure in the excel sheet, the data is concentrated in the poor and good condition. It can be concluded that most of the house sample in average state with 82 houses are in both poor and good condition. Meanwhile, only 15 houses are categorized in the very poor condition.

In presenting the data for categorical variables, a contingency table is advisable which contain the joint responses between the two variables. In the construction of the table, the house prices are classified into four class with the range $392,000. In general, among all the suburbs, the condition of the houses in suburb C outweighed the other suburbs. More than half of the houses in suburb C are in good and excellent condition. Meanwhile, houses of the other two suburbs are generally in the poor and good condition.

As for the prices based on the condition of the houses in Kingfisher Bay, it can be concluded that the condition has a weak influence on the house prices. The prices of the houses are concentrated on $586,000-$1,370,000 regardless the condition of the houses. In some cases, the house with the poor condition has lower price than the houses in very poor condition. In another case, despite the condition of the houses are poor, the prices are in the same range with the houses in the excellent condition.

As for the prices in the suburb, the data of the contingency table show that the range house prices are different between suburb to suburb. Suburb A house prices are more affordable than the prices of other suburbs. The houses with relatively high price are located in suburb C, while majority of the suburb B house prices are in middle range and $588,000-$1,370,000. In comparison to the condition of the houses, it can be assumed that the suburb has more impact on the house prices of Kingfisher Bay.

House prices and Factor Influencing House Prices

It is believed that the house prices in Kingfisher Bay strongly influenced by the good rental investments (weekly rent and rental return). In order to identify the possible relationship and of the two variables, a scatterplot is one of the visual methods to picture the dependencies. In addition, to measure the strength and direction of the relationship, the data of the two variables is counted by using the coefficient correlation formula. Detailed elaboration is included in the q3 sheet of excel attach file.

Based on the first scatter plot diagram, the observations depict an uphill pattern where the point move from left to right. This visual indicates positive relationship between X variable (weekly rent) and Y variables (house prices) which means that as the weekly rent increases, the house price tends to increase as well. Further, the coefficient correlation result of 0.666 implies that the dependencies of the two variables are relatively strong.

In contrast, the relationship between house prices (Y variable) and rental return (X variable) are negative, in the meaning that the house prices tend to incline when the rental return decrease. This statement is based on the scatter plot diagram of the two variables which is in the downhill pattern where the observations tend to go down as it move from left to right. In addition, the coefficient of correlation result is 0.40 which can be interpreted that the relationship between two variables is quite weak.

Further observation on the relationship between house prices and other numerical variables indicate that number of bedrooms, number of rooms, and wide area are the variables that have significant influence on the house prices. The evidence is the result of computation using the coefficient correlation formula, where number of bedrooms 0.54, number of main rooms 0.51 and wide area 0.57. Based on the result, all of the three variables have positive and quite strong relationship with the house prices, which can be implied that the house prices increase as the value of the related variables increase.

Hypotheses Test

In the association with the claims that Kingfisher Bay is the most unaffordable area in Melbourne, the weekly rent of the houses is rated at least $600. In determining the validity of the statement, a hypothesis testing is performed for each suburb based on the sample data to draw the information.

Based on the computation of the hypotheses testing in the excel sheet file, it has been found that only suburb A that reject the null hypotheses while other suburbs fail to reject the null hypotheses. It can be concluded that the claim regarding the weekly rent is indeed true. This statement is supported by another hypotheses test of the overall sample of the Kingfisher Bay weekly rent which fails to reject the null hypotheses.

In accordance to the claim that 75% of the houses are ten years or older, the computation of the hypotheses test of the categorical data shows that the result of the p-value is larger than the 5% significance. This evidence can be interpreted that the hypotheses test fails to reject the null hypotheses, and the statement that 75% of the area is lack of development cannot be disproved.

Sample Size Estimation

In order to obtain the sample justification for the future surveys for the objectives to compare the house prices and rental rate of the Kingfisher Bay, sample size determination is used to compute the result. In the analysis, 95% of confidence level is used to determine the sample size alongside 90% and 99% confidence level as the comparison. The formula for determining the minimum sample size for proportion is indicated as follow:

The above equation is applied to the Q5 sheet of excel attached file in determining the sample size of for the vacant houses in Kingfisher Bay. For the 95% confidence level, the minimum sample size acquired is 125. Apparently, the estimation value is slightly higher than current sample size used for the observation. Following the result, 90% confidence level estimate 88 for the amount of the sample size required which is much lower than the current one. Meanwhile, the computation result of 95% level is 216 which is much larger than current sample size.

Meanwhile, the formula for determining the minimum sample size for the house prices within $50,000 is formulated as follows:

Based on the computation using the above formula, the sample size determines on the 95% confidence level is 163 which is quite higher than the value of the current observation. As for the 90% and 95% confidence level, the sample size for the future surveys is estimated as 115 and 282 respectively.

In the theoretical context, more accuracy is expected from the larger sample size. Therefore, it is advisable to use the confidence level of 99% with the largest value for both house prices and rental availability. By doing so, the smallest error would lessen the impact of the sample in representing the population. As the size of the population is initially quite huge, there always possibilities for the null hypotheses to be mistakenly analyzed.

  1. Conclusion

Overall, it can be concluded that the house prices from the sample data indeed influenced by several indicators on the surroundings and condition of the houses. Several variables have significant and positive dependencies on the house prices such as weekly rent, number of main rooms and bedrooms. The relationship between house prices and the variables mentioned is relatively strong as shown by the result scatter plot and correlation coefficient computation. In comparison, the distance to the shops is the only factors that have negative yet quite strong association with the house prices.

In regard to the claims on the house prices, indeed overall analysis of the hypotheses test demonstrate that the prices are relatively high, particularly in suburbs B and C. as for the claim on the development of the area, the results of the hypotheses show similar sign which indicates that most of houses' age are 10 years or older.

It is recommended to use larger sample size for the future surveys of the house prices and rental availability. Because the larger sample used in observation, the more accuracy can be expected from the observations. In this case, the analysis should include more sample data in the future to increase the accuracy in representing the entire population since the current sample set is relatively small in comparison to the total of 60.000 houses located in Kingfisher Bay.