Skip to content

Teach What You Know, Learn What You Don’t

Image courtesy of World Economic Forum

Say you want to start a second career as a teacher. You might make a little money teaching adult education courses. You might even work as a substitute teacher or volunteer as an instructor at a senior center. But would you volunteer content you’ve published or posted online to train some company’s new chatbot for no pay or recognition? If so, you’re well on your way. If not, I suggest you keep reading.

X, the social media platform formerly known as Twitter, recently updated its terms of service to enable its AIs to train on user content. As of now, users cannot opt out, and even if you delete your tweets or close your account, the company retains copies of them. The relevant portion of its endless Terms of Service agreement reads (emphasis mine):

You agree that this license includes the right for us to (i) provide, promote, and improve the Services, including, for example, for use with and training of our machine learning and artificial intelligence models, whether generative or another type; and (ii) to make Content submitted to or through the Services available to other companies, organizations or individuals, including, for example, for improving the Services and the syndication, broadcast, distribution, repost, promotion or publication of such Content on other media and services, subject to our terms and conditions for such Content use. Such additional uses by us, or other companies, organizations or individuals, is made with no compensation paid to you with respect to the Content that you submit, post, transmit or otherwise make available through the Services as the use of the Services by you is hereby agreed as being sufficient compensation for the Content and grant of rights herein.

And so, whatever is in your profile can be used to feed AI knowledge bases for any purpose Elon Musk decides to use it for. For example, someone could ask the AI, “What do you know about Geoff Dutton a.k.a. @PerfidyPress?” to compile a dossier on me that includes my email address, my location, my web sites, and sites and accounts I’ve connected with on Twitter, along with my posted images and opinions.

Given the proclivities of the platform’s owner, should I become a nuisance to the Trump regime or anyone else in power, they will know what I think and have said about things and where to find me. And soon, I might start to receive  malicious messages and software from various troublemakers connected to  them. That would be a strong incentive for me to keep my mouth shut online.

And of course, users of Facebook and Whatsapp can expect similar treatment. (See more specifics here.) Meta has been vague about how it uses scraped data other than to claim that private communications are exempt. Thanks to EU’s GDPR regulations, users there are allowed to opt out, but as the previous link describes, the process is convoluted and Meta doesn’t promise to honor their requests.

The professional social network Linked In also trains AI applications with users’ content (exempting residents of China, the EU, and several other countries from that indignity). Unlike Meta, it doesn’t exclude direct messages from training content. However, it still allows users to opt out, so don’t delay, do that now.

As for Google, in July 2023 its privacy policy was changed to say:

“Google uses information to improve our services and to develop new products, features and technologies that benefit our users and the public. For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

The policy seems to exclude your Gmail messages and Google Docs files (unless you post them publicly), but it includes all searches you’ve ever made on Google and You Tube and what results and ads your viewed. What its Chrome browser reports about users is anybody’s guess.

For these and other reasons I have long used anonymous search engines, as I said in my August newsletter. Now maybe you don’t care how much Google, Meta, and X know and can infer about you, and anyway, who wants to deal with this stuff?

I don’t either, but for 25 years or so, I’ve felt an obligation to warn people, starting by criticizing tech innovators’ mad dash to fame and fortune. (I, too, was complicit, having developed and documented software.) Somehow, increasing the pace of change in science and technology is supposed to be all good for us. What it’s been good at is binding us to devices, code, and services that now control a good part of our lives, currently being subsumed by AI infiltrators.

People ask experts for cures for diseases, but whoever asked for AI? Why is it being shoved down our throats at every turn, eroding our agency? Computers already mediate many of our interactions. Now they are toying with our affections. In his recent book Nexus: A Brief History of Information Networks from the Stone Age to AI, historian Yuval Noah Harari says he expects AIs’ cultivation of false intimacy to further loosen ties (p. 210):

In any political battle for minds and hearts, intimacy is a powerful weapon, and chatbots like Google’s LaMDA and OpenAI’s GPT-4 are gaining the ability to mass-produce intimate relationships with millions of people. In the 2010s social media was a battleground for controlling human attention. In the 202s the battle is likely to shift from attention to intimacy.

But I suppose in a country where millions of people are convinced that Donald Trump loves them, AI operatives will have easy pickings once they get to know them. To get up to speed on how computer networks are usurping our autonomy, I highly recommend Harari’s lucid, plain-spoken, and well-documented book, written from a broad historical perspective.*

Now, there’s a reason why Meta, Google, and the company formerly known as Twitter are so eager to get their paws on user content: Their large language models (LLMs) are running out of fodder. Throughout the AI industry, bigger and better LLMs have not yielded much better results than ChatGPT-4, because they’ve read all the “high quality” texts out there and still need more training to outpace the competition. See this NYT explainer (gift link).

So they turn to social media posts, You Tube video transcripts (which are full of errors), and worst of all, to texts other LLMs cough up. By training on LLMs’ outputs, they are learning to reproduce previously-distilled inaccuracies. By training on social media, they amplify biases, disinformation, and conspiracy theories.

Don’t like it? Stay abreast of AI’s encroachments by reading/subscribing to The Technoskeptic, or sign up for alerts from the Electronic Privacy Information Center, epic.org. Click Issues there to see what they do that you care about.

And once you’ve got the drift, please speak out about it. You can start by forwarding this article to people you know who have a need to know. There’s not much we can do to keep AI from metastasizing, but at least we can try to inform and protect ourselves.

______

*  If you’re curious about how LLMs do what they do, I recommend this SSRN journal article by D. Gervais et al. called The Heart of the Matter: Copyright, AI Training, and LLMs. It goes deep into some fascinating weeds.


Visit Perfidy Press Bookstore

You can find this and previous Perfidy Press Provocations in our newsletter archive. Should you see any you like, please consider forwarding this or links to others to people who might like to subscribe, and thanks.

Visit Perfidy Press

And, if you must, you can unsubscribe here.

 

Published indonald TrumpDonald TrumpEssayMediaNewsletterPolitics

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.