This is the technology worth trillions of dollars huh

HarkMahlberg@kbin.earth · 7 days ago

This is the technology worth trillions of dollars huh

ilinamorato@lemmy.world · 7 days ago

✅ Colorado

✅ Connedicut

✅ Delaware

❌ District of Columbia (on a technicality)

✅ Florida

But not

❌ I’aho

❌ Iniana

❌ Marylan

❌ Nevaa

❌ North Akota

❌ Rhoe Islan

❌ South Akota

individual@toast.ooo · 6 days ago

Gosh tier comment.

ilinamorato@lemmy.world · 6 days ago

You just described most of my post history.

boonhet@sopuli.xyz · 7 days ago

Everyone knows it’s properly spelled “I, the ho” not Idaho. That’s why it didn’t make the list.

Jankatarch@lemmy.world · 7 days ago

They took money from cancer reaearch programs to fund this.

Burninator05@lemmy.world · 6 days ago

After we pump another hundred trillion dollars and half the electricity generated globally into AI you’re going to feel pretty foolish for this comment.

veni_vedi_veni@lemmy.world · 6 days ago

Just a couple billion more parameters, bro, I swear, it will replace all the workers

CEOs

jumping_redditor@sh.itjust.works · 6 days ago

only cancer patients benefit from cancer research, CEOs benefit from AI

Jankatarch@lemmy.world · 6 days ago

Tbf cancer patients benefit from AI too tho a completely different type that’s not really related to LLM chatbot AI girlfriend technology used in these.

kreskin@lemmy.world · 6 days ago

Well as long as we still have enough money to buy weapons for that one particular filthy genocider country in the middle east, we’re fine.

skisnow@lemmy.ca · 6 days ago

I don’t think this gets nearly enough visibility: https://www.academ-ai.info/

Papers in peer-reviewed journals with (extremely strong) evidence of AI shenanigans.

beveradb@sh.itjust.works · 6 days ago

Thanks for sharing! I clicked on it with cynicism around how easily we could detect AI usage with confidence vs. risking making false allegations, but every single example on their homepage is super clear and I have no doubts - I’m impressed! (and disappointed)

skisnow@lemmy.ca · edit-2 6 days ago

Yup. I had exactly the same trepidation, and then it was all like “As an AI model, I don’t have access to the data you requested, however here are some examples of…”

I have more contempt for the peer reviewers who let those slide into major journals, than for the authors. It’s like the Brown M&M test; if you didn’t spot that blatant howler then no fucking way did you properly check the rest of the paper before waving it through. The biggest scandal in all this isn’t that it happened, it’s that the journals involved seem to be almost never retracting them upon being reported.

dude@lemmings.world · 7 days ago

Well, for anyone who knows a bit about how LLMs work, it’s pretty obvious why LLMs struggle with identifying the letters in the words

BritishJ@lemmy.world · 7 days ago

Well go on…

JustTesting@lemmy.hogru.ch · 7 days ago

They don’t look at it letter by letter but in tokens, which are automatically generated separately based on occurrence. So while ‘z’ could be it’s own token, ‘ne’ or even ‘the’ could be treated as a single token vector. of course, ‘e’ would still be a separate token when it occurs in isolation. You could even have ‘le’ and ‘let’ as separate tokens, afaik. And each token is just a vector of numbers, like 300 or 1000 numbers that represent that token in a vector space. So ‘de’ and ‘e’ could be completely different and dissimilar vectors.

so ‘delaware’ could look to an llm more like de-la-w-are or similar.

of course you could train it to figure out letter counts based on those tokens with a lot of training data, though that could lower performance on other tasks and counting letters just isn’t that important, i guess, compared to other stuff

MangoCats@feddit.it · 7 days ago

Of course, when the question asks “contains the letter _” you might think an intelligent algorithm would get off its tokens and do a little letter by letter analysis. Related: ChatGPT is really bad at chess, but there are plenty of algorithms that are super-human good at it.

MangoCats@feddit.it · 7 days ago

Con-ned-di-cut

BritishJ@lemmy.world · 7 days ago

Good read. Thank you

fading_person@lemmy.zip · 7 days ago

Wouldn’t that only explain errors by omission? If you ask for a letter, let’s say D, it would omit words containing that same letter when in a token in conjunction with more letters, like Da, De, etc, but how would it return something where the letter D isn’t even in the word?

JustTesting@lemmy.hogru.ch · 7 days ago

Well each token has a vector. So ‘co’ might be [0.8,0.3,0.7] just instead of 3 numbers it’s like 100-1000 long. And each token has a different such vector. Initially, those are just randomly generated. But the training algorithm is allowed to slowly modify them during training, pulling them this way and that, whichever way yields better results during training. So while for us, ‘th’ and ‘the’ are obviously related, for a model no such relation is given. It just sees random vectors and the training reorganizes them tho slowly have some structure. So who’s to say if for the model ‘d’, ‘da’ and ‘co’ are in the same general area (similar vectors) whereas ‘de’ could be in the opposite direction. Here’s an example of what this actually looks like. Tokens can be quite long, depending how common they are, here it’s ones related to disease-y terms ending up close together, as similar things tend to cluster at this step. You might have an place where it’s just common town name suffixes clustered close to each other.

and all of this is just what gets input into the llm, essentially a preprocessing step. So imagine someone gave you a picture like the above, but instead of each dot having some label, it just had a unique color. And then they give you lists of different colored dots and ask you what color the next dot should be. You need to figure out the rules yourself, come up with more and more intricate rules that are correct the most. That’s kinda what an LLM does. To it, ‘da’ and ‘de’ could be identical dots in the same location or completely differents

plus of course that’s before the llm not actually knowing what a letter or a word or counting is. But it does know that 5.6.1.5.4.3 is most likely followed by 7.7.2.9.7(simplilied representation), which when translating back, that maps to ‘there are 3 r’s in strawberry’. it’s actually quite amazing that they can get it halfway right given how they work, just based on ‘learning’ how text structure works.

but so in this example, us state-y tokens are probably close together, ‘d’ is somewhere else, the relation between ‘d’ and different state-y tokens is not at all clear, plus other tokens making up the full state names could be who knows where. And tien there’s whatever the model does on top of that with the data.

for a human it’s easy, just split by letters and count. For an llm it’s trying to correlate lots of different and somewhat unrelated things to their ‘d-ness’, so to speak

fading_person@lemmy.zip · 6 days ago

Thank you very much for taking your time to explain this. if you don’t mind, do you recommend some reference for further reading on how llms work internally?

JustTesting@lemmy.hogru.ch · 6 days ago

For the byte pair encoding (how those tokens get created) i think https://bpemb.h-its.org/ does a good job at giving an overview. after that i’d say self attention from 2017 is the seminal work that all of this is based on, and the most crucial to understand. https://jtlicardo.com/blog/self-attention-mechanism does a good job of explaining it. And https://jalammar.github.io/illustrated-transformer/ is probably the best explanation of a transformer architecture (llms) out there. Transformers are made up of a lot of self attention.

it does help if you know how matrix multiplications work, and how the backpropagation algorithm is used to train these things. i don’t know of a good easy explanation off the top of my head but https://xnought.github.io/backprop-explainer/ looks quite good.

and that’s kinda it, you just make the transformers bigger, with more weight, pluck on a lot of engineering around them, like being able to run code and making it run more efficientls, exploit thousands of poor workers to fine tune it better with human feedback, and repeat that every 6-12 month for ever so it can stay up to date.

fading_person@lemmy.zip · 6 days ago

Thank you very much

cyberwolfie@lemmy.ml · 6 days ago

You could look up 3Blue1Brown’s explainers on YouTube, they are pretty good and shows a lot of visual examples. He has a lot of other videos on other areas of math.

fading_person@lemmy.zip · 6 days ago

I’ll check it later, thanks

Gladaed@feddit.org · 6 days ago

Which is State contains 狄? They use a different alphabet, so understanding ours is ridiculous.

Blackmist@feddit.uk · 7 days ago

Just another trillion, bro.

NateNate60@lemmy.world · 7 days ago

Just another 1.21 jigawatts of electricity, bro. If we get this new coal plant up and running, it’ll be enough.

Tryenjer@lemmy.world · 7 days ago

Behold the most expensive money burner!

Curious Canid@lemmy.ca · 4 days ago

This is the perfect time for LLM-based AI. We are already dealing with a significant population that accepts provable lies as facts, doesn’t believe in science. and has no concept of what hypocrisy means. The gross factual errors and invented facts of current AI couldn’t possibly fit in better.

sqgl@sh.itjust.works · 7 days ago

ChatGPT is just as stupid.

SaveTheTuaHawk@lemmy.ca · 7 days ago

it’s actually getting dumber.

panda_abyss@lemmy.ca · edit-2 7 days ago

Yesterday i asked Claude Sonnet what was on my calendar (since they just sent a pop up announcing that feature)

It listed my work meetings on Sunday, so I tried to correct it…

You’re absolutely right - I made an error! September 15th is a Sunday, not a weekend day as I implied. Let me correct that: This Week’s Remaining Schedule: Sunday, September 15

Just today when I asked what’s on my calendar it gave me today and my meetings on the next two thursdays. Not the meetings in between, just thursdays.

Something is off in AI land.

Edit: I asked again: gave me meetings for Thursday’s again. Plus it might think I’m driving in F1

FlashMobOfOne@lemmy.world · 7 days ago

A few weeks ago my Pixel wished me a Happy Birthday when I woke up, and it definitely was not my birthday. Google is definitely letting a shitty LLM write code for it now, but the important thing is they’re bypassing human validation.

Stupid. Just stupid.

python@lemmy.world · 7 days ago

pixel? ~~have you heard ~about grapheneOS tho…~~~

achance4cheese@sh.itjust.works · 7 days ago

Also, Sunday September 15th is a Monday… I’ve seen so many meeting invites with dates and days that don’t match lately…

panda_abyss@lemmy.ca · 7 days ago

Yeah, it said Sunday, I asked if it was sure, then it said I’m right and went back to Sunday.

I assume the training data has the model think it’s a different year or something, but this feature is straight up not working at all for me. I don’t know if they actually tested this at all.

Sonnet seems to have gotten stupider somehow.

Opus isn’t following instructions lately either.

MangoCats@feddit.it · 6 days ago

We’ve used the Google AI speakers in the house for years, they make all kinds of hilarious mistakes. They also are pretty convenient and reliable for setting and executing alarms like “7AM weekdays”, and home automation commands like “all lights off”. But otherwise, it’s hit and miss and very frustrating when they push an update that breaks things that used to work.

IngeniousRocks (They/She) @lemmy.dbzer0.com · 7 days ago

Hey look the markov chain showed its biggest weakness (the markov chain)!

In the training data, it could be assumed by output that Connecticut usually follows Colorado in lists of two or more states containing Colorado. There is no other reason for this to occur as far as I know.

Markov Chain based LLMs (I think thats all of them?) are dice-roll systems constrained to probability maps.

Edit: just to add because I don’t want anyone crawling up my butt about the oversimplification. Yes. I know. That’s not how they work. But when simplified to words so simple a child could understand them, its pretty close.

AlecSadler@lemmy.blahaj.zone · 7 days ago

Oh l I was thinking it’s because people pronounce it Connedicut

IngeniousRocks (They/She) @lemmy.dbzer0.com · 7 days ago

Awe cute!

ramjambamalam@lemmy.ca · 7 days ago

I was wondering if you’d get similar results for states with the letter R, since there’s lots of prior art mentioning these states as either “D” or “R” during elections.

brem@sh.itjust.works · 5 days ago

Lol @ these fucking losers who think AI is the current answer to any problems

SugarCatDestroyer@lemmy.world · 5 days ago

AI will most likely create new problems in the future as it eats up electricity like a world eater, so I fear that soon these non-humans will only turn on electricity for normal people for a few hours a day instead of the whole day to save energy for the AI.

I’m not sure about this of course, but it’s quite possible.

arararagi@ani.social · 5 days ago

Third time’s the charm! They have to keep the grift going after Blockchain and NFT failed with the general public.

HarkMahlberg@kbin.earth · 5 days ago

@arararagi@ani.social Don’t forget Metaverse, they took a fuckin bath on that.

arararagi@ani.social · 5 days ago

Funny thing is, the metaverse as their pictured it failed, but vrchat itself had it’s biggest spike this year.

brem@sh.itjust.works · 5 days ago

As long as there’s something to sell for untalented morons to feel intelligent & talented; they’ll take the bait.

samus12345@sh.itjust.works · 7 days ago

Connedicut.

I wondered if this has been fixed. Not only has it not, the AI has added Nebraska.

MML@sh.itjust.works · 7 days ago

What about Our Kansas? Cause according to Google Arkansas has one o in it. Refreshing the page changes the answer though.

samus12345@sh.itjust.works · 7 days ago

Just checked, it sure does say that! AI spouting nonsense is nothing new, but it’s pretty ironic that a large language model can’t even parse what letters are in a word.

monotremata@lemmy.ca · 5 days ago

It’s because, for the most part, it doesn’t actually have access to the text itself. Before the data gets to the “thinking” part of the network, the words and letters have been stripped out and replaced with vectors. The vectors capture a lot of aspects of the meaning of words, but not much of their actual text structure.

boonhet@sopuli.xyz · 6 days ago

Well I mean it’s a statistics machine with a seed thrown in to get different results on different runs. So really, it models the structure of language, but not the meaning. Kinda useless.

ilinamorato@lemmy.world · 7 days ago

I would assume it uses a different random seed for every query. Probably fixed sometimes, not fixed other times.

sugar_in_your_tea@sh.itjust.works · 6 days ago

You mean Connecdicud.

Aceticon@lemmy.dbzer0.com · 7 days ago

“This is the technology worth trillions of dollars”

You can make anything fly high in the sky with enough helium, just not for long.

(Welcome to the present day Tech Stock Market)

MangoCats@feddit.it · 7 days ago

Bubbles and crashes aren’t a bug in the financial markets, they’re a feature. There are whole legions of investors and analysts who depend on them. Also, they have been a feature of financial markets since anything resembling a financial market was invented.

Yaztromo@lemmy.world · 6 days ago

GitLab Enterprise somewhat recently added support for Amazon Q (based on claude) through an interface they call “GitLab Duo”. I needed to look up something in the GitLab docs, but thought I’d ask Duo/Q instead (the UI has this big button in the top left of every screen to bring up Duo to chat with Q):

(Paraphrasing…)

ME: How do I do X with Amazon Q in GitLab? Q: Open the Amazon Q menu in the GitLab UI and select the appropriate option.

ME: [:looks for the non-existant menu:] ME: Where in the UI do I find this menu?

Q: My last response was incorrect. There is no Amazon Q button in GitLab. In fact, there is no integration between GitLab and Amazon Q at all.

ME: [:facepalm:]

SaveTheTuaHawk@lemmy.ca · 7 days ago

We’re turfing out students by the tens on academic misconduct. They are handing in papers with references that clearly state “generated by Chat GPT”. Lazy idiots.

NateNate60@lemmy.world · 7 days ago

This is why invisible watermarking of AI-generated content is likely to be so effective. Even primitive watermarks like file metadata. It’s not hard for anyone with technical knowledge to remove, but the thing with AI-generated content is that anyone who dishonestly uses it when they are not supposed to is probably also too lazy to go through the motions of removing the watermarking.

SaveTheTuaHawk@lemmy.ca · 7 days ago

if you are going to do all that, just do the research and learn something.

NateNate60@lemmy.world · 7 days ago

Aye that’s exactly the same thing that I said

Dharma Curious@startrek.website · 7 days ago

Couldn’t students just generate a paper with ChatGPT, open two windows wide by side and then type it out in a word document?

SaveTheTuaHawk@lemmy.ca · 7 days ago

but that’s work.

NateNate60@lemmy.world · 7 days ago

Students view doing that as basically the same amount of work as writing the paper yourself

chaospatterns@lemmy.world · 7 days ago

Depends on the watermark method used. Some people talk about watermarking by subtly adjusting the words used. Like if there’s 5 synonyms and you pick the 1st synonym, next word you pick the 3rd synonym. To check the watermark you have to access to the model and probabilities to see if it matches that. The tricky part about this is that the model can change and so can the probabilities and other things I don’t fully understand.

MangoCats@feddit.it · 6 days ago

I think I’d at least use an OCR program to do the bulk of the typing for me…

JustTesting@lemmy.hogru.ch · 7 days ago

Huh that actually does sound like a good use-case of LLMs. Making it easier to weed out cheaters.

resipsaloquitur@lemmy.world · 7 days ago

Listen, we just have to boil the ocean five more times.

Then it will hallucinate slightly less.

Or more. There’s no way to be sure since it’s probabilistic.

MangoCats@feddit.it · 7 days ago

If you want to get irate about energy usage, shut off your HVAC and open the windows.

resipsaloquitur@lemmy.world · 7 days ago

Worthless comment.

elevenbones@sh.itjust.works · 7 days ago

Even more worthless than mine, somehow.

Pup Biru@aussie.zone · 6 days ago

sounds reasonable… i’ll just go tell large parts of australia where it’s a workplace health and safety issue to be out of AC for more than 15min during the day that they should do their bit for climate change and suck it up… only a few people will die

jumping_redditor@sh.itjust.works · 6 days ago

maybe people shouldn’t live there then?

Pup Biru@aussie.zone · 6 days ago

of course you’re right! we should just shut down some of the largest mines in the world

i foresee no consequences from this

(related note: south australia where one of the largest underground mines in the world is, largely gets its power from renewables)

people should probably move from canada and most of the north of the USA too: far too cold up there during winter