One way to design a spam filter is to look at the words in the email. In particular, there are some words that occur more commonly in spam emails than in non-spam emails. For instance, suppose that we know the following probabilities:
- 30% of emails are spam.
- 2% of spam emails contain the word "refinance".
- 0.1% of non-spam emails contain the word "refinance".
We would like to know the probability that an email containing the word "refinance" is spam. To do this, we will simulate LOTS of emails, some spam and some non-spam, some with and some without the word "refinance", and see what fraction of the emails containing "refinance" is spam.
Design Specifications
1. Use Excel to simulate a large number of emails. The number of emails should be large enough that you expect there to be (on average) about 30 occurrences of non-spam email containing the word "refinance". However, the number of emails should also be small enough that the workbook is not noticeably slow in processing. (In other words, you should not have 1 million emails in your simulation.) If you can't predict how many emails this is, you can determine this number dynamically through trial and error.
2. Your workbook should be well organized and easy to understand. Include your name, a context (e.g., ECE 214, Exercise C1), the date you started, and a description. Add background color to make it easier to navigate, e.g., by highlighting what's input by the user and what your final answers are. Use cells as labels where needed.
3. Your workbook should be robust and efficient. For instance, the probability of an email being spam should be a parameter in a cell, which the user can change, and the value should be located above the list where it is used. All three probabilities given above should be input by the user.