Project 2- Hypothesis Testing

For this project, we had to conduct four hypothesis tests for data contained in the following SQL database:

This requires accessing the appropriate data in SQL, designing hypothesis tests and then conducting the tests using T-Tests and ANOVA.

The first hypothesis test was given to for us to test: Do customers buy more discounted products? At what level of discount?

To test this, we needed to define the null hypothesis. That would be that there is no effect of discounted products on levels of discount. Mathematically, Ho = Ha. Our alternative hypothesis is that customers buy more discounted products, or Ha > Ho. The plan was to get the data from the “Order Detail” table, and sample it, and then run a two tailed, two sample T-Test. Our alpha, for this and all other tests, was set at .05.

This was simple enough. The data was all contained in the Order Detail table so no SQL joins were needed. I simply pulled the Quantity and Discount attributes in two separate tables: the control where “discount = 0”, and the experimental where “discount != 0.” My final tables had the average number of products purchased for each product according to the two respective discount levels – discounted, or not discounted.

I then sampled the two groups so they were normally distributed, ran some cursory stats on them, checked the effect size using the Cohen D formula, and performed a T-Test. Just looking at the histograms and the means of the two groups showed there was a difference in average quantity bought. The effect size was greater than .8, so the difference that we see is relevant taking into accounts the units involved. There is a large effect. The t statistic was used in conjunction with degrees of freedom to create a p-value. That p-value was less than our alpha of .05, so we rejected the null hypothesis. Discounts do affect the quantity of products purchased.

To see at what level, I split the discounts into three levels based on the amount of discount – small, medium and large. I performed a t-test on each group to see if it had an significant effect on quantity of products purchased. They all had p-values less than .05, so all levels of discount are relevant.

That concluded the first hypothesis test. I then designed three more looking at 1) does the price affect if a product is reordered or not? 2) does the season affect the quantity of products purchased and 3) What is the effect of quantity and price on if a product is discontinued or not?

For each of these, I used SQL to get the data I needed, sampled the data, and then performed hypothesis tests to see if the null hypothesis could be rejected.

Findings:

  1. More expensive products are not reordered.
  2. People buy more products in winter.
  3. Quantity does have a significant effect on if a product is discontinued, price does have a significant effect.

Leave a comment