A study published earlier this week by Surge AI appears to lay bare one of the biggest problems plaguing the AI industry: bullshit, exploitative data-labeling practices. Last year, Google built a dataset called “GoEmotions.” It was billed as a “fine-grained emotion dataset” — basically a ready-to-train-on dataset for building AI that can recognize emotional sentiment in text. Per a Google blog post: In “GoEmotions: A Dataset of Fine-Grained Emotions”, we describe GoEmotions, a human-annotated dataset of 58k Reddit comments extracted from popular English-language subreddits and labeled with 27 emotion categories. As the largest fully annotated English language fine-grained emotion dataset…
This story continues at The Next Web
Or just read more coverage about: Google