• 0 Posts
  • 18 Comments
Joined 1 year ago
cake
Cake day: August 16th, 2023

help-circle
  • And that’s more or less what I was aiming for, so we’re back at square one. What you wrote is in line with my first comment:

    it is a weak compliment for AI, and more of a criticism of the current web search engines

    The point is that there isn’t something that makes AI inherently superior to ordinary search engines. (Personally I haven’t found AI to be superior at all, but that’s a different topic.) The difference in quality is mainly a consequence of some corporate fuckery to wring out more money from the investors and/or advertisers and/or users at the given moment. AI is good (according to you) just because search engines suck.




  • they’re a great use in surfacing information that is discussed and available, but might be buried with no SEO behind it to surface it

    This is what I’ve seen many people claim. But it is a weak compliment for AI, and more of a criticism of the current web search engines. Why is that information unavailable to search engines, but is available to LLMs? If someone has put in the work to find and feed the quality content to LLMs, why couldn’t that same effort have been invested in Google Search?






  • I don’t get the impression you’ve ever made any substantial contributions to Wikipedia, and thus have misguided ideas about what would be actually helpful to the editors and conductive to producing better articles. Your proposal about translations is especially telling, because the machine-assisted translations (i.e. with built-in tools) have already existed on WP long before the recent explosion of LLMs.

    In short, your proposals either: 1. already exist, 2. would still risk distorsion, oversimplification, made-up bullshit and feedback loops, 3. are likely very complex and expensive to build, or 4. are straight up impossible.

    Good WP articles are written by people who have actually read some scholarly articles on the subject, including those that aren’t easily available online (so LLMs are massively stunted by default). Having an LLM re-write a “poorly worded” article would at best be like polishing a turd (poorly worded articles are usually written by people who don’t know much about the subject in the first place, so there’s not much material for the LLM to actually improve), and more likely it would introduce a ton of biases on its own (as well as the usual asinine writing style).

    Thankfully, as far as I’ve seen the WP community is generally skeptical of AI tools, so I don’t expect such nonsense to have much of an influence on the site.





  • So child porn is okay then? You would already have it on your system

    You’d have to look for it, knowing fully well that it is illegal to produce in the first place and distribute to others, access it online, and then deliberately retain it. It’s not really the same as something that’s legal to produce and distribute (it is certainly legal for me to view your site). You wouldn’t “already” have it.

    I doubt you are either.

    Well I’ve read some copyright laws, had to solve some issues regarding usage of copyrighted works, etc. Nothing that makes me an expert, but I’m not talking wholly out of my ass either.

    It does… on paper… A lot. https://time.com/6266147/internet-archive-copyright-infringement-books-lawsuit/ To the point it’s losing lawsuits over exactly that.

    That’s not Wayback Machine per se, that’s Internet Archive’s book scanning and “digital lending” system, which was most definitely doing legally questionable (and stupid) things even to an amateur eye. However, Wayback Machine making read-only copies of websites has for now never been disputed successfully.



  • You don’t have any rights to do anything else with it.

    That’s patently false. At a minimum, I can quote parts of your content, just as you can quote smaller portions of any published text anywhere, you don’t have to ask the publisher or author for permission. It’s also ridiculous and impossible to control, the content is on my private machine already, how can any law be relevant or exerted upon what I do there? I doubt you’re writing this comment on the basis of your knowledge of copyright law.

    Incorrect. Your browser made it do that. How that data is accessed and displayed is not controlled by me.

    You’re arguing semantics that really don’t make any difference. The display is irrelevant, because the data by itself is stored on my computer before it is displayed. That data is what you’ve put up online to be accessed.

    Owning the CD grants you a license to the content on that CD. That’s about as good as ownership gets there. They own the CD/license. As long as that CD exists/works. You don’t gain that same right by simply visiting a website.

    I fail to see the difference between getting a CD with some data (buying it or being given for free, as e.g. a gift) and being sent some data online for free. More importantly - says who? Does copyright law say this about websites?

    If an artist makes a painting… and posts a picture of it. They have no rights to the painting anymore? They deserve no ownership/pay for what they’ve done?

    This simply doesn’t follow from what I’ve written. They certainly retain the rights to the painting. Besides, “deserving pay” depends on completely different factors than the ones we’re discussing, usually artists sell the actual object, the painting. A digital reproduction is, as far as most people care (I think), merely an informative reproduction, and not the real thing. Stuff that’s posted online for free is… free. It wasn’t intended to be made money with directly.

    Your final paragraph is really confusing me, you seem to be saying that Wayback Machine is also committing theft, which I’m pretty sure is not true (I’ve followed the lawsuits against IA for a while and don’t remember anyone invoking that term). And at this point I don’t know what “theft” is even supposed to mean to you or to anyone else, and what was the point of the discussion anyway. Maybe I should reread the whole discussion carefully all over again, but I’m on my phone and it’s all giving me a headache.


  • , it’s a salty article

    Actually the author himself is somewhat harmed by this situation. I would be salty too. When I wish to write my CV, I can say: my text have been published at X and Y. Especially nice if it’s an important and well known publication. Now a part of his CV is literally erased, he can’t access his own texts anymore (not even on Internet Archive). That’s… utterly ridiculous. It’s a common practice to send the author a copy (or multiple) of the text he has published, he has every right to own a copy of them. Now the copy that was intended to be available to everyone is not available even to him. Something of the sort really has happened to me too when a website I published an article on a site underwent a redesign and now the text just isn’t available anymore. Admittedly it’s still on IA, but it’s an awkward situation.



  • You’ve put it out there for free, though, and the data literally ends up on my machine because you made it do that, so what’s the problem with me saving the data on my machine for later, and potentially sharing it elsewhere for free again?

    then publishing it as your own is theft

    1. This scenario (misattribution of content) has nothing to do with the previous discussion. The other commenter is making an analogy to CDs, owning a CD and lending it to others doesn’t mean you’re claiming its content is your own creation.

    2. Theft implies deprivation of ownership. Calling this theft is like calling piracy theft. It may be illegal by this or that metric, but it’s not normal theft.