@ClamDrinker - klein.ruhr

ClamDrinker@lemmy.world · 3 days ago

If you think that’s depressing, wait until you find out that it’s basically nothing in the grand scheme of things.

spoiler

Most sources agree that we use about 4 trillion cubic meters of water every year worldwide (Although, this stat is from 2015 most likely, and so it will be bigger now). In 2022, using the stats here Microsoft used 1.7 billion gallons per year, and Google 5.56 billion gallons per year. In cubic meters that’s only 23.69 million cubic meters. That’s only 0.00059% of the worldwide water usage. Meanwhile agriculture uses on average 70% of a country’s daily fresh water.

Even if we just look at the US, since that’s where Google and Microsoft are based, they use 322 billion gallons of water every day, resulting in about 445 billion cubic meters per year, that’s still 0.00532%. So you can have 187 more Googles and Microsofts before you even top a single percentage.

_

And as others have pointed out the water isn’t gone, there’s some cyclicality in how the water is used.

ClamDrinker@lemmy.world · edit-2 12 days ago

There is so much wrong with this…

AI is a range of technologies. So yes, you can make surveillance with it, just like you can with a computer program like a virus. But obviously not all computer programs are viruses nor exist for surveillance. What a weird generalization. AI is used extensively in medical research, so your life might literally be saved by it one day.

You’re most likely talking about “Chat Control”, which is a controversial EU proposal to scan either on people’s devices or from provider’s ends for dangerous and illegal content like CSAM. This is obviously a dystopian way to achieve that as it sacrifices literally everyone’s privacy to do it, and there is plenty to be said about that without randomly dragging AI into that. You can do this scanning without AI as well, and it doesn’t change anything about how dystopian it would be.

You should be using end to end regardless, and a VPN is a good investment for making your traffic harder to discern, but if Chat Control is passed to operate on the device level you are kind of boned without circumventing this software, which would potentially be outlawed or made very difficult. It’s clear on it’s own that Chat Control is a bad thing, you don’t need some kind of conspiracy theory about ‘the true purpose of AI’ to see that.

ClamDrinker@lemmy.world · 2 months ago

I never anthropomorphized the technology, unfortunately due to how language works it’s easy to misinterpret it as such. I was indeed trying to explain overfitting. You are forgetting the fact that current AI technology (artificial neural networks) are based on biological neural networks. There is a range of quirks that it exhibits that biological neural networks do as well. But it is not human, nor anything close. But that does not mean that there are no similarities that can be rightfully pointed out.

Overfitting isn’t just what you describe though. It also occurs if the prompt guides the AI towards a very specific part of it’s training data. To the point where the calculations it will perform are extremely certain about what words come next. Overfitting here isn’t caused by an abundance of data, but rather a lack of it. The training data isn’t being produced from within the model, but as a statistical inevitability of the mathematical version of your prompt. Which is why it’s tricking the AI, because an AI doesn’t understand copyright - it just performs the calculations. But you do. And so using that as an example is like saying “Ha, stupid gun. I pulled the trigger and you shot this man in front of me, don’t you know murder is illegal buddy?”

Nobody should be expecting a machine to use itself ethically. Ethics is a human thing.

People that use AI have an ethical obligation to avoid overfitting. People that produce AI also have an ethical obligation to reduce overfitting. But a prompt quite literally has infinite combinations (within the token limits) to consider, so overfitting will happen in fringe situations. That’s not because that data is actually present in the model, but because the combination of the prompt with the model pushes the calculation towards a very specific prediction which can heavily resemble or be verbatim the original text. (Note: I do really dislike companies that try to hide the existence of overfitting to users though, and you can rightfully criticize them for claiming it doesn’t exist)

This isn’t akin to anything human, people can’t repeat pages of text verbatim like this and no toddler can be tricked into repeating a random page from a random book as you say.

This is incorrect. A toddler can and will verbatim repeat nursery rhymes that it hears. It’s literally one of their defining features, to the dismay of parents and grandparents around the world. I can also whistle pretty much my entire music collection exactly as it was produced because I’ve listened to each song hundreds if not thousands of times. And I’m quite certain you too have a situation like that. An AI’s mind does not decay or degrade (Nor does it change for the better like humans) and the data encoded in it is far greater, so it will present more of these situations in it’s fringes.

but it isn’t crafting its own sentences, it’s using everyone else’s.

How do you think toddlers learn to make their first own sentences? It’s why parents spend so much time saying “Papa” or “Mama” to their toddler. Exactly because they want them to copy them verbatim. Eventually the corpus of their knowledge grows big enough to the point where they start to experiment and eventually develop their own style of talking. But it’s still heavily based on the information they take it. It’s why we have dialects and languages. Take a look at what happens when children don’t learn from others: https://en.wikipedia.org/wiki/Feral_child So yes, the AI is using it’s training data, nobody’s arguing it doesn’t. But it’s trivial to see how it’s crafting it’s own sentences from that data for the vast majority of situations. It’s also why you can ask it to talk like a pirate, and then it will suddenly know how to mix in the essence of talking like a pirate into it’s responses. Or how it can remember names and mix those into sentences.

Therefore it is factually wrong to state that it doesn’t keep the training data in a usable format

If your arguments is that it can produce something that happens to align with it’s training data with the right prompt, well yeah that’s not incorrect. But it is so heavily misguided and borders bad faith to suggest that this tiny minority of cases where overfitting occurs is indicative of the rest of it. LLMs are a prediction machines, so if you know how to guide it towards what you want it to predict, and that is in the training data, it’s going to predict that most likely. Under normal circumstances where the prompt you give it is neutral and unique, you will basically never encounter overfitting. You really have to try for most AI models.

But then again, you might be arguing this based on a specific AI model that is very prone to overfitting, while I am arguing this out of the technology as a whole.

This isn’t originality, creativity or anything that it is marketed as. It is storing, encoding and copying information to reproduce in a slightly different format.

It is originality, as these AI can easily produce material never seen before in the vast, vast majority of situations. Which is also what we often refer to as creativity, because it has to be able to mix information and still retain legibility. Humans also constantly reuse phrases, ideas, visions, ideals of other people. It is intellectually dishonest to not look at these similarities in human psychology and then treat AI as having to be perfect all the time, never once saying the same thing as someone else. To convey certain information, there are only finite ways to do so within the English language.

ClamDrinker@lemmy.world · 2 months ago

This is an issue for the AI user though. And I do agree that needs to be more conscious in people’s minds. But I think time will change that. Perhaps when the photo camera came out there were some shmucks that took pictures of people’s artworks and claimed it as their own because the novelty of the technology allowed that for a bit, but eventually those people are properly differentiated from people properly using it.

ClamDrinker@lemmy.world · edit-2 2 months ago

Like if I download a textbook to read for a class instead of buying it - I could be proscecuted for stealing

Ehh, no almost certainly not (But it does depend on your local laws). But that honestly just sounds like some corporate boogyman to prevent you from pirating their books. The person hosting the download, if they did not have the rights to publicize it freely, would possibly be prosecuted though.

To illustrate, there’s this story of John Cena who sold a special Ford after signing a contract with Ford to explicitly forbid him from doing that. However, the person who bought the car was never prosecuted or sued, because they received the car from Cena with no strings attached. They couldn’t be held responsible for Cena’s break of contract, but Cena was held personally responsible by Ford.

For physical goods there is ‘theft by proxy’ though (receiving stolen goods that you know are most likely stolen), but that quite certainly doesn’t apply to digital, copyable goods. As to even access any kind of information on the internet, you have to download and thus, copy it.

ClamDrinker@lemmy.world · edit-2 2 months ago

That would be true if they used material that was paywalled. But the vast majority of the training information used is publicly available. There’s plenty of freely available books and information that you only require an internet connection for to access, and learn from.

ClamDrinker@lemmy.world · edit-2 2 months ago

Your first point is misguided and incorrect. If you’ve ever learned something by ‘cramming’, a.k.a. just repeating ingesting material until you remember it completely. You don’t need the book in front of you anymore to write the material down verbatim in a test. You still discarded your training material despite you knowing the exact contents. If this was all the AI could do it would indeed be an infringement machine. But you said it yourself, you need to trick the AI to do this. It’s not made to do this, but certain sentences are indeed almost certain to show up with the right conditioning. Which is indeed something anyone using an AI should be aware of, and avoid that kind of conditioning. (Which in practice often just means, don’t ask the AI to make something infringing)

ClamDrinker@lemmy.world · 2 months ago

This would be a good point, if this is what the explicit purpose of the AI was. Which it isn’t. It can quote certain information verbatim despite not containing that data verbatim, through the process of learning, for the same reason we can.

I can ask you to quote famous lines from books all day as well. That doesn’t mean that you knowing those lines means you infringed on copyright. Now, if you were to put those to paper and sell them, you might get a cease and desist or a lawsuit. Therein lies the difference. Your goal would be explicitly to infringe on the specific expression of those words. Any human that would explicitly try to get an AI to produce infringing material… would be infringing. And unknowing infringement… well there are countless court cases where both sides think they did nothing wrong.

You don’t even need AI for that, if you followed the Infinite Monkey Theorem and just happened to stumble upon a work falling under copyright, you still could not sell it even if it was produced by a purely random process.

Another great example is the Mona Lisa. Most people know what it looks like and if they had sufficient talent could mimic it 1:1. However, there are numerous adaptations of the Mona Lisa that are not infringing (by today’s standards), because they transform the work to the point where it’s no longer the original expression, but a re-expression of the same idea. Anything less than that is pretty much completely safe infringement wise.

You’re right though that OpenAI tries to cover their ass by implementing safeguards. Which is to be expected because it’s a legal argument in court that once they became aware of situations they have to take steps to limit harm. They can indeed not prevent it completely, but it’s the effort that counts. Practically none of that kind of moderation is 100% effective. Otherwise we’d live in a pretty good world.

ClamDrinker@lemmy.world · 2 months ago

Honestly, that’s why open source AI is such a good thing for small creatives. Hate it or love it, anyone wielding AI with the intention to make new expression will be much more safe and efficient to succeed until they can grow big enough to hire a team with specialists. People often look at those at the top but ignore the things that can grow from the bottom and actually create more creative expression.

ClamDrinker@lemmy.world · edit-2 3 months ago

I respectfully disagree. Sure, it didn’t cure the world of ignorant people like we hoped, but they are not the average rational person. It massively increased the awareness of people about international issues like climate change, racism, injustice, and allowed people to forge bonds abroad far more easily. The discourse even among ignorant people is different from 20 years ago. However, the internet that did that might no longer be the same one it is today.

But honestly, “more facts leads to more truth” wasn’t the point of my message. It was “more spread of falsehoods leads to higher standards of evidence to back up the actual truth”, which isn’t quite the same. Before DNA evidence and photographic / video evidence, people sometimes had to rely on testimony. Nowadays if someone tells you a story that screams false you might say “pics or it didn’t happen.”. That’s the kind of progress I’m referring to.

Someone presenting you only a single photo of something damning is the hearsay of yesterday. (And honestly, it’s been that way since Photoshop came out, but AI will push that point even further)

ClamDrinker@lemmy.world · 3 months ago

I have a similar hesitancy, but unfortunately that’s why we can’t even really trust ourselves either. The statistics we can put to paper already paints such a different image of society than the one we experience. So even though it feels like these people are everywhere and such a mindset is growing, there are many signs that this is not the case. But I get it, that at times also feels like puffing some hopium. I’m fortunate to have met enough stubborn people that did end up changing their minds on their own personal irrationality, and as I grew older I caught myself doing the same a couple of times as well. That does give me hope.

And well, if you look at history, the kind of shit people believed. Miasma, bloodletting, superstitious beliefs, to name a few. As time has moved on, the majority of people has grown. Even a century where not a lot changes in that regard (as long as it doesn’t regress) can be a speed bump in the mindset of the future.

ClamDrinker@lemmy.world · edit-2 3 months ago

While I share this sentiment, I think/hope the eventual conclusion will be a better relationship between more people and the truth. Maybe not for everyone, but more people than before. Truth is always more like 99.99% certain than absolute truth, and it’s the collection of evidence that should inform ‘truth’. The closest thing we have to achieving that is the court system (In theory).

You don’t see the electric wiring in your home, yet you ‘know’ flipping the switch will cause electricity to create light. You ‘know’ there is not some other mechanism in your walls that just happens to produce the exact same result. But unless you check, you technically didn’t know for sure. Someone could have swapped it out while you weren’t looking, even if you built it yourself. (And even if you check, your eyes might deceive you).

With Harris’ airport crowd, honestly if you weren’t there, you have to trust second hand accounts. So how do you do that? One video might not say a lot, and honestly if I saw the alleged image in a vacuum I might have been suspicious of AI as well.

But here comes the context. There are many eye witness perspectives where details can be verified and corroborated. The organizer isn’t an habitual liar. It happened at a time that wasn’t impossible (eg. a sort of ‘counter’-alibi). It happened in a place that isn’t improbable (She’s on the campaign trail). If true, it would require a conspiracy level of secrecy to pull of. And I could list so many more things.

Anything that could be disproven with ‘It might have been AI’, probably would have not stuck in court anyways. It’s why you take testimony, because even though that proves nothing on it’s own, if corroborated with other information it can make one situation more or less probable.

ClamDrinker@lemmy.world · 3 months ago

Humans do. Humans guide and use AI towards what they want to make. And AI don’t make for-profit products either, that’s also humans.

ClamDrinker@lemmy.world · edit-2 3 months ago

I don’t know why you’re being downvoted. You’re absolutely correct (at least, in the US). And it seems to be based on pretty solid reasoning, so I could see a lot of other copyright offices following the same idea.

Source: https://www.copyright.gov/ai/ai_policy_guidance.pdf (See header II. The Human Authorship Requirement)

TL;DR

the Office states that “to qualify as a work of ‘authorship’ a work must be created by a human being” and that it “will not register works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author.”

ClamDrinker@lemmy.world · edit-2 3 months ago

That’s sort of currently the law with copyright in the US. You can’t get a copyright on material made completely by an AI. Only if a human interfered can you get a copyright, and most likely only on the parts that the human interfered with.

Source: https://www.copyright.gov/ai/ai_policy_guidance.pdf (See header II. The Human Authorship Requirement)

TL;DR

the Office states that “to qualify as a work of ‘authorship’ a work must be created by a human being” and that it “will not register works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author.”

ClamDrinker@lemmy.world · 3 months ago

LLMs can’t have truly new unique ideas yes, but neither does most of humanity. AIs are very good at mixing information, acting like an unrestricted pattern mixer. It’s missing the artistic intent to drive that somewhere meaningful and enjoyable to humans to be seen as real art, which is what sets humans apart from AI despite. But humans can include artistic intent into the materials the AI uses, like lyrics and style, which in turns does raise it back to the level of real art.

ClamDrinker@lemmy.world · 3 months ago

That’s where open source AI comes in. If we have the same freedoms then all it takes is grassroots efforts to ensure the tools born of humanity’s information remain free to be used by all of humanity. We should also be able to use the same tool without having to pay those companies a dime.

ClamDrinker@lemmy.world · 3 months ago

I totally agree. However, AI assisted works will probably fare much better, depending on how much a human inserted artistic intent into the AI. It’s as you say, people enjoy the steps they have control over, and I think most will use it to replace the steps they are bad at, or which take very long otherwise. Fully AI generated works will probably never be good enough.

ClamDrinker@lemmy.world · edit-2 3 months ago

Is there a place you share them? That sounds hilarious.

ClamDrinker@lemmy.world · 3 months ago

That’s because you’re using AI for the correct thing. As others have pointed out, if AI usage is enforced (like in the article), chances are they’re not using AI correctly. It’s not a miracle cure for everything and should just be used when it’s useful. It’s great for brainstorming. Game development (especially on the indie side of things) really benefit from being able to produce more with less. Or are you using it for DnD?