Chi-Square Tests: A Simple Test and Tool We Can All Use

June 24, 2019


One of the first operational audits I performed when I started auditing in higher education was a review of the Grade Change Authorization process. Even though I was at a fairly small sized school, there were almost 200,000 issued grades over the five year audit period, with almost 5,000 grade changes!! I knew data analysis would be a key success factor and a key challenge.

Traditional simple percentage analysis would give me some help in determining where to concentrate my work, but I wondered if there might be a better way of identifying data “noise” from significant data insights. Drawing on my Six Sigma training I decided to review my tools and see if one of my data analysis tools could be helpful. I followed a simple logic flow to decide upon a Chi Square test. The decision path included only a couple of questions.

1. Is my input data continuous or discrete? Discrete (continuous data can be segmented into finer and finer measurements; discrete data is binary, meaning ‘yes’ or ‘no’, ‘1’ or ‘2’, ‘on’ or ‘off’).

2. Is my output data continuous? No, it is discrete.

3. Then the use of a Chi-Square Test may provide a relevant data analysis.

Graphically this can be shown as:

Winter-18-Chi-test.PNG
Chi-Square Tests:
Winter-18-chi-fig-1.PNG

If you are like me and see something like Figure 1 which is the Chi-squared test as illustrated on Wikipedia, your eyes automatically gloss over and you look for your statistical guide book. Chi- Square in simple terms is a proportions test that compares individual proportions against the population proportion to highlight differences.

I wanted to identify academic programs that had a significantly different number of grade changes versus all programs and all grade changes. With 45 academic programs, 198,480 grades and 4,649 changes, this sounded pretty complicated……but really wasn’t using Chi-Square testing. One word of caution here: you have to make sure you start with clean, scrubbed data.

There are many software packages available and free websites that have a Chi-Square utility; I use Minitab® as it is fairly friendly and has a great discount for educational institutions. In order for Minitab to recognize my data input, I had to adjust the format. The “Help” and Assistant” facilities were satisfactory and aided me in the process.

As with any testing methodology, the tester has to understand some basic concepts about the test / results, have an expectation for the outcome and know what questions they want to answer (a hypothesis).
Chi- Square in simple terms is a proportions test that compares individual proportions against the population proportion to highlight differences.

As stated before, Chi-Square is a proportions test that compares individual proportions against the population proportion to highlight differences. The test results (see Figure 2) will show you actual numbers, expected numbers, Chi-Square contribution and P-Value.
 

winter-18-chi-gig-2.PNG

I want to know two things: first, is the proportion of changes in each subset, e.g., “AGRI” significantly different that the proportion of changes across all data (“ALL”); and second, if there are significant differences, in which subsets do they occur?

Are there significant differences? Using the test summary in Figure 3, I can look at the “P-Value” (highlighted in yellow) to show me how much I can rely upon my hypothesis. In this case, the “P-Value” is less than 0.001, so I cannot rely on my hypothesis (there is no significant difference between subsets and the total data); I accept the alternate hypothesis that there are subsets with statistically significant differences from the total data. A good rule of thumb is a P=Value less than 0.005 indicates rejection of the hypothesis.

Where do these differences occur? Chi-Square tests provide a Chi-Square score and by observing which subsets contribute the most to the score I can identify subsets where I need to focus. In Figure 3, I look at the Chi-Square contribution (highlighted in green) for the largest contributors. Three items jump out: first, ~143 in AGRI (I expected ~38 changes in AGRI, and had 112); second, ~19 in ECON (I expected ~81 changes in ECON, but only had 41); and finally, ~15 in ACCT (I expected ~122 changes in ACCT, but only had 78). These contributors all happen to be in the “# of grade changes” section but they can occur in either. Now I know where to look and what to ask.

Winter-18-chi-fig-3.PNG
When I approached a data management / computer science colleague about the use of Chi-Square in auditing and fraud detection, he was amazed at the application of this tool to these purposes. He was excited to try it.

The purpose of this article is not to convert you into a statistical guru or Minitab® expert; however, I hope it has introduced you to some common statistical tools in some uncommon audit approaches which you can utilize. Use of Chi-Square has simplified the process of sifting through large data sets, and calculating percentages in order to identify the significant differences.

About the Author

Chad Muhlestein

Chad Muhlestein, is the leader of Internal Audit at Delaware State University; he also has responsibilities for Enterprise Risk Management and Advisory Services. Chad received a Master of Accountancy degree from Brigham Young University with an...
Read Full Author Bio

Chad Muhlestein

Chad Muhlestein, is the leader of Internal Audit at Delaware State University; he also has responsibilities for Enterprise Risk Management and Advisory Services. Chad received a Master of Accountancy degree from Brigham Young University with an emphasis in Internal Audit. Prior to joining Delaware State University, Chad held many audit and internal control roles at DuPont.

Articles
Chi-Square Tests: A Simple Test and Tool We Can All Use