1. The following table consists of training data from an employee database. The data have been generalized. For example, "31 . . . 35" for age represents the age range of 31 to 35. For a given row entry, count represents the number of data tuples having the values for department, status, age, and salary given in that row.
department status age salary count
sales senior 31...35 46K...50K 30
sales junior 26...30 26K...30K 40
sales junior 31...35 31K...35K 40
systems junior 21...25 46K...50K 20
systems senior 31...35 66K...70K 5
systems junior 26...30 46K...50K 3
systems senior 41...45 66K...70K 3
marketing senior 36...40 46K...50K 10
marketing junior 31...35 41K...45K 4
secretary senior 46...50 36K...40K 4
secretary junior 26...30 26K...30K 6
Let status be the class label attribute.
(a) Use your algorithm to construct a decision tree from the given data.
(b) Given a data tuple having the values "systems," "26 . . . 30," and "46-50K" for the attributes department, age, and salary, respectively, what would a naive Bayesian classification of the status for the tuple be?
2. (a) Build a decision tree from the given tennis dataset. You should build a tree to predict playTennis, based on the other attributes (but, do not use the Day attribute in your tree.). Show all of your work, calculations, and decisions as you build the tree.
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No