Question

You have been given access to a large movie rating dataset containing about 5M records with fields like Movie Name, Average Movie Rating, Genre, Number of Reviewers, Date of Release and few other numeric columns. You plan to build a Data mining model that predicts the average review based on the other columns. Which is the best approach you would adopt to build the model. Randomly sample a few 1000's of records and explore whether you can predict the rating with reasonable accuracy dropping features that don't aid in improving the predictive accuracy Use the entire dataset to build a regression model to predict the average movie rating by regressing against the remaining columns dropping features that don't aid in improving the predictive accuracy None of the above Drop the movie titles and Genres because they are unstructured data and only use the numeric columns to build a regression model using the rest of the entire data set.

          You have been given access to a large movie rating dataset
containing about 5M records with fields like Movie Name, Average
Movie Rating, Genre, Number of Reviewers, Date of Release and few
other numeric columns. You plan to build a Data mining model that
predicts the average review based on the other columns. Which is
the best approach you would adopt to build the model. 
    Randomly sample a few 1000's of records and
explore whether you can predict the rating with reasonable accuracy
dropping features that don't aid in improving the predictive
accuracy
    Use the entire dataset to build a regression model
to predict the average movie rating by regressing against the
remaining columns dropping features that don't aid in improving the
predictive accuracy
    None of the above
   Drop the movie titles and Genres because they are
unstructured data and only use the numeric columns to build a
regression model using the rest of the entire data set.

Added by Cindy S.

Question

Please give Ace some feedback