Bluesky, a rapidly growing social platform, has faced scrutiny over its open API, which allows third parties to collect and use public data for various purposes, including AI training.
While Bluesky itself does not use user content to train AI models, the platform’s Firehose API enables unrestricted access to public posts. This was highlighted in a report by 404 Media, which revealed that a machine learning librarian at AI firm Hugging Face used the API to extract one million public posts from Bluesky for research purposes. The dataset was initially made publicly available but was later removed following public backlash.
The incident serves as a reminder that content posted publicly on Bluesky remains accessible to anyone. Bluesky has acknowledged this challenge and is exploring ways to let users express consent preferences for their data. However, the platform admits it cannot enforce these preferences beyond its own systems.
In a statement, Bluesky noted:
“Bluesky won’t be able to enforce this consent outside of our systems. It will be up to outside developers to respect these settings. We’re having ongoing conversations with engineers & lawyers and we hope to have more updates to share on this shortly!”
As Bluesky grows in popularity, it faces the same level of scrutiny as other major social networks. This incident underscores the importance of transparency and accountability in handling public data, particularly as platforms navigate the complexities of consent in an increasingly data-driven world.