Question

[20pts] One way to design a spam filter is to look at the words in the email. In particular, there are some words that occur more commonly in spam emails than in non-spam emails. For instance, suppose that we know the following probabilities: * 30% of emails are spam. * 2% of spam emails contain the word "refinance". * 0.1% of non-spam emails contain the word "refinance". We would like to know the probability that an email containing the word "refinance" is spam. To do this, we will simulate LOTS of emails, some spam and some non-spam, some with and some without the word "refinance", and see what fraction of the emails containing "refinance" is spam. Design Specifications 1. Use Excel to simulate a large number of emails. The number of emails should be large enough that you expect there to be (on average) about 30 occurrences of non-spam email containing the word "refinance". However, the number of emails should also be small enough that the workbook is not noticeably slow in processing. (In other words, you should not have 1 million emails in your simulation.) If you can't predict how many emails this is, you can determine this number dynamically through trial and error. [2] 2. Your workbook should be well organized and easy to understand. Include your name, a context (e.g., ECE 214, Exercise C1), the date you started, and a description. Add background color to make it easier to navigate, e.g., by highlighting what's input by the user and what your final answers are. Use cells as labels where needed. [2] 3. Your workbook should be robust and efficient. For instance, the probability of an email being spam should be a parameter in a cell, which the user can change, and the value should be located above the list where it is used. All three probabilities given above should be input by the user. [3] 4. Use a list of random numbers to coordinate the emails. For instance, assuming the random numbers are between 0 and 1, you can use values below a certain cutoff to represent "spam" and values above the cutoff to be "non-spam". [2] 5. Use another column of random numbers (or two) to determine whether or not an email contains the word "refinance". This might be more complicated than it seems at first, so some thought is needed here. In fact, the essence of this problem lies in how you interpret and program this part. [3] 6. Check the totals at each stage. For instance, if you expect 30% of emails to be spam, count the number of spam emails, compute the probability that an email is spam, and verify that the value is about 30%. Check all three given probabilities. [3] 7. Compute the probability that an email containing the word "refinance" is spam. [3] 8. Write a general expression to compute this probability, as a function of the three given probabilities. You might need to try different values of these givens before you understand the pattern of what's going on. Compute this value in a cell, label it, and highlight it. [2]

[20pts] One way to design a spam filter is to look at the words in the email. In particular, there are some words that occur more commonly in spam emails than in non-spam emails. For instance, suppose that we know the following probabilities:
* 30% of emails are spam.
* 2% of spam emails contain the word "refinance".
* 0.1% of non-spam emails contain the word "refinance".
We would like to know the probability that an email containing the word "refinance" is spam.
To do this, we will simulate LOTS of emails, some spam and some non-spam, some with and some without the word "refinance", and see what fraction of the emails containing "refinance" is spam.

Design Specifications
1. Use Excel to simulate a large number of emails. The number of emails should be large enough that you expect there to be (on average) about 30 occurrences of non-spam email containing the word "refinance". However, the number of emails should also be small enough that the workbook is not noticeably slow in processing. (In other words, you should not have 1 million emails in your simulation.) If you can't predict how many emails this is, you can determine this number dynamically through trial and error. [2]
2. Your workbook should be well organized and easy to understand. Include your name, a context (e.g., ECE 214, Exercise C1), the date you started, and a description. Add background color to make it easier to navigate, e.g., by highlighting what's input by the user and what your final answers are. Use cells as labels where needed. [2]
3. Your workbook should be robust and efficient. For instance, the probability of an email being spam should be a parameter in a cell, which the user can change, and the value should be located above the list where it is used. All three probabilities given above should be input by the user. [3]
4. Use a list of random numbers to coordinate the emails. For instance, assuming the random numbers are between 0 and 1, you can use values below a certain cutoff to represent "spam" and values above the cutoff to be "non-spam". [2]
5. Use another column of random numbers (or two) to determine whether or not an email contains the word "refinance". This might be more complicated than it seems at first, so some thought is needed here. In fact, the essence of this problem lies in how you interpret and program this part. [3]
6. Check the totals at each stage. For instance, if you expect 30% of emails to be spam, count the number of spam emails, compute the probability that an email is spam, and verify that the value is about 30%. Check all three given probabilities. [3]
7. Compute the probability that an email containing the word "refinance" is spam. [3]
8. Write a general expression to compute this probability, as a function of the three given probabilities. You might need to try different values of these givens before you understand the pattern of what's going on. Compute this value in a cell, label it, and highlight it. [2]

$[20pts] One way to design a spam filter is to look at the words in the email. In particular, there are some words that occur more commonly in spam emails than in non-spam emails. For instance, suppose that we know the following probabilities: * 30% of emails are spam. * 2% of spam emails contain the word "refinance". * 0.1% of non-spam emails contain the word "refinance". We would like to know the probability that an email containing the word "refinance" is spam. To do this, we will simulate LOTS of emails, some spam and some non-spam, some with and some without the word "refinance", and see what fraction of the emails containing "refinance" is spam. Design Specifications 1. Use Excel to simulate a large number of emails. The number of emails should be large enough that you expect there to be (on average) about 30 occurrences of non-spam email containing the word "refinance". However, the number of emails should also be small enough that the workbook is not noticeably slow in processing. (In other words, you should not have 1 million emails in your simulation.) If you can't predict how many emails this is, you can determine this number dynamically through trial and error. [2] 2. Your workbook should be well organized and easy to understand. Include your name, a context (e.g., ECE 214, Exercise C1), the date you started, and a description. Add background color to make it easier to navigate, e.g., by highlighting what's input by the user and what your final answers are. Use cells as labels where needed. [2] 3. Your workbook should be robust and efficient. For instance, the probability of an email being spam should be a parameter in a cell, which the user can change, and the value should be located above the list where it is used. All three probabilities given above should be input by the user. [3] 4. Use a list of random numbers to coordinate the emails. For instance, assuming the random numbers are between 0 and 1, you can use values below a certain cutoff to represent "spam" and values above the cutoff to be "non-spam". [2] 5. Use another column of random numbers (or two) to determine whether or not an email contains the word "refinance". This might be more complicated than it seems at first, so some thought is needed here. In fact, the essence of this problem lies in how you interpret and program this part. [3] 6. Check the totals at each stage. For instance, if you expect 30% of emails to be spam, count the number of spam emails, compute the probability that an email is spam, and verify that the value is about 30%. Check all three given probabilities. [3] 7. Compute the probability that an email containing the word "refinance" is spam. [3] 8. Write a general expression to compute this probability, as a function of the three given probabilities. You might need to try different values of these givens before you understand the pattern of what's going on. Compute this value in a cell, label it, and highlight it. [2]$

Added by Linda G.

Question

Please give Ace some feedback