Even staunch fans are calling out Apple's less-than-transparent AI training data harvesting

Cal Jeffrey

Posts: 4,243   +1,454
Staff member
Bottom line: Apple has been very slow and careful in jumping on the AI bandwagon. It even refused to call its model AI during WWDC, introducing Apple Intelligence instead to set itself apart. However, no matter what you call it, developers must still feed models with hundreds of millions or even billions of data samples to remain competitive.

So far, we haven't had a fair look at the feature since Apple Intelligence doesn't make a public debut until later this year. We have only seen what Apple showed us at WWDC, which is not an unbiased assessment. Like any other company, Apple will only present the best of what it has to offer. It can hash out the fine print later. However, with the rapid growth of commercial AI, that's not good enough.

The company could have easily released some information or a FAQ page on how it trains its generative AI models but has so far remained as quiet as it was before officially announcing its AI tech. The only thing it has reported on the subject is that it collects data like everybody else, using a tool it calls AppleBot, which is supposed to be more privacy-friendly. However, minding privacy and minding IP rights are two different things.

Now, some of Apple's most passionate supporters are calling out its lack of transparency on the hows and whats of Apple Intelligence data gathering.

"I wish Apple would have explained to the public in a more transparent way how they collected their training data," video games artist and creators' rights activist Jon Lam told Engadget. "I think their announcement could not have come at a worse time."

One would think that with Apple's slow roll on AI, it would have learned that the climate on information harvesting for generative model training has been and continues to be chilly. More than a few artists have filed IP infringement lawsuits against AI developers for using their work without permission or payment – over a dozen by Engadget's count. Infringement lawsuits against AI providers have popped up from prominent industry players like The New York Times and Universal to the most minor independent artists.

"That's why I wanted to give them a slight benefit of the doubt," said Lam. "I thought they would approach the ethics conversation differently."

It's an even bigger PR violation when considering Cupertino's stance on privacy and that Apple has long positioned itself as the artist's best tool. The company charges a premium for its high-end production platforms that millions of creative users swear by. Tarnishing its reputation with unscrupulous data collection is the last thing it needs.

John Giannandrea, Apple's senior vice president of machine learning and AI strategy, downplayed the company's sample collection by saying that Apple trained its models mostly on in-house data. However, Giannandrea didn't get into specifics, like how much "mostly" is and where it obtained the rest of its training samples.

Inc. reports that Apple has entered licensing deals with extensive image databases like Shutterstock and Photobucket, but the company has not publicly confirmed these reports. The status quo has never been a pillar of Apple's business. While the Cupertino powerhouse hasn't commented on the negative feedback yet, it would be surprising if it didn't address the issue before launching Apple Intelligence this fall.

Permalink to story:

 
Good, no doubt Apple will be as repugnant as Adobe and others but will get away with it because they are Apple and Cook will pontificate about his holier than thou observance of consumer rights.
 
Let's be serious, you cannot be competitive in the AI market without using enough data to train the models. Apple isn't doing anything special compared to everybody else.
 
I think it's probably best to wait until they actually finish developing the product and what disclosures they do or don't make when it launches. 👍
 
Back