The NumPy array you created from task 1 is unstructured because we let NumPy decide what the datatype for each value should be. Also, it contains the header row that is not necessary for the analysis. Typically, it contains float values, with some description columns like created_at, etc. So, we are going to remove the header row, and we are also going to explicitly tell NumPy to convert all columns to type float (i.e., "float") apart from columns specified by indexes, which should be Unicode of length 30 characters (i.e., "<U30"). Finally, every row is converted as a type tuple (e.g., tuple(i) for i in data).
Write a function unstructured_to_structured(data, indexes) that achieves the above goal.
For example:
Test
Result
data = load_metrics("mini_covid_sentiment_metrics.csv")
data = unstructured_to_structured(data, [0, 1, 7, 8]) # 0, 1, 7, 8 are indices of created_at, tweet_ID, sentiment_category, and emotion_category
print(data[0])
data = load_metrics("mini_covid_sentiment_metrics.csv")
data = unstructured_to_structured(data, [0, 1, 7, 8])
print(data[5][0].dtype)
<U30
data = load_metrics("mini_covid_sentiment_metrics.csv")
data = unstructured_to_structured(data, [0, 1, 7, 8])
print(data[5][3].dtype)
float64