Help using Python (please):
Your task: Complete the function; calc_top_genres (labeled_metadata, top_labels) below. It takes as input two objects:
1. labeled_metadata: a pandas dataframe formatted like the labeled_metadata df above.
2. top_labels: A Python set of labels, like the top ebook_labels above.
It should then do the following:
For each label in top_labels, it should determine the two most frequently occurring genres among the books with that label. It should then return a single dataframe with two columns: label and genre. Each row should correspond to one (label, genre) pair. And as per the preceding bullet, you expect to see two rows per label.
Regarding the number of rows per label, there are two exceptions:
1. First, if a given label only has books from one genre, then there will only be one row.
2. Second, if there are ties, then you should retain all pairs in the same way you would have done in Exercise.
Note: Your function must not modify the input arguments. The test cell will check for that and may fail with strange errors if you do so.
Note: The order of rows does not matter, as the test cell will use tibble comparison functions.
Example: A correct implementation will produce, for the call calc_top_genres(labeled_metadata_df, top_dataset), the following result:
label genre
Literature Fiction Romance
Children's eBooks Literature Fiction
Health, Fitness Dieting Literature Fiction
Science Fiction & Fantasy Literature Fiction
Literature Fiction Romance
Though not definitive, this result does suggest that the clustering captures distinct groups of books. Here, for instance, the cluster having Romantic novels (label 0) is distinct from the cluster with Children's eBooks (label 15) and from another with Health, Fitness Dieting (label 25), for instance.
def calc_top_genres(labeled_metadata, top_labels):