FaceDeer

FaceDeer@fedia.io · 1 hour ago

Yeah, at the time Voyager came out I considered it the worst of the Star Trek live action series. It’s since been surpassed many times over for that title, but there’s still a lot of episodes that are not very good individually and the overall premise of the show was wasted.

That said, there are a few very good episodes, and a couple of the characters were really enjoyable. The Doctor and 7 of 9 became some of my favourite Star Trek characters across the franchise.

Unfortunately Janeway was an inconsistent psychopath and Chakotay was a block of wood. So they had to struggle against the background.

It’s been too long for my memory to be able to dredge up a recommended viewing list of the best episodes to focus on, but perhaps you could scrounge one up on the web somewhere. Voyager was back in the day when series had a lot of episodes and a lot of them were relatively stand-alone so skipping over a bunch likely won’t hurt if you pick them well.

FaceDeer@fedia.io · 8 hours ago

As much as people on the Fediverse or Reddit or whatever other social media bubble we might be in like to insist “nobody wants this” or that AI is useless, it actually is useful and a lot of people do want it. I’m already starting to see the hard-line AI hate softening, more people are going “well maybe this application of AI is okay.” This will increase as AI becomes more useful and ubiquitous.

There’s likely a lot of AI companies and products starting up right now that aren’t going to make it. That’s normal when there’s a brand new technology, nobody knows what the “winning” applications are going to be yet so they’re throwing investment at everything to see what sticks. Some stuff will indeed stick, AI isn’t going to go away. Like how the Internet stuck around after the Dot Com bust cleared out the chaff. But I’d be rather careful about what I invest in myself.

I’m not a fan of big centralized services and subscriptions, which unfortunately a lot of the American AI companies are driving for. But fortunately an unlikely champion of AI freedom has arisen in the form of… China? Of all places. They’ve been putting out a lot of really great open-weight models, focusing hard on getting them to train and run well on more modest hardware, and releasing the research behind it all as well. Partly that’s because they’re a lot more compute-starved than Western companies and have no choice but to do it that way, but partly just to stick their thumb in those companies’ eyes and prevent them from establishing dominance. I know it’s self-interest, of course. Everything is self-interest. But I’ll take it because it’s good for my interests too.

As for how far the technology improves? Hard to say. But I’ve been paying attention to the cutting edge models coming out, and general adoption is still way behind what those things are capable of. So even if models abruptly stopped improving tomorrow there’s still years of new developments that’ll roll out just from making full use of what we’ve got now. Interesting times ahead.

FaceDeer@fedia.io · 2 days ago

I have a sneaking suspicion that the vast majority of the people raging about AIs scraping their data are not raging about it being done inefficiently.

FaceDeer@fedia.io · 2 days ago

You’re thinking of “model decay”, I take it? That’s not really a thing in practice.

FaceDeer@fedia.io · 2 days ago

I have no idea what “established means” would be. In the particular case of the Fediverse it seems impossible, you can just set up your own instance specifically intended for harvesting comments and use that. The Fediverse is designed specifically to publish its data for others to use in an open manner.

FaceDeer@fedia.io · 2 days ago

Are you proposing flooding the Fediverse with fake bot comments in order to prevent the Fediverse from being flooded with fake bot comments? Or are you thinking more along the lines of that guy who keeps using “Þ” in place of “th”? Making the Fediverse too annoying to use for bot and human alike would be a fairly phyrric victory, I would think.

FaceDeer@fedia.io · 2 days ago

A basic Google search for “synthetic data llm training” will give you lots of hits describing how the process goes these days.

Take this as “defeatist” if you wish, as I said it doesn’t really matter. In the early days of LLMs when ChatGPT first came out the strategy for training these things was to just dump as much raw data onto them as possible and hope quantity allowed the LLM to figure something out from it, but since then it’s been learned that quality is better than quantity and so training data is far more carefully curated these days. Not because there’s “poison” in it, just because it results in better LLMs. Filtering out poison will happen as a side effect.

It’s like trying to contaminate a city’s water supply by peeing in the river upstream of the water treatment plant drawing from it. The water treatment plant is already dealing with all sorts of contaminants anyway.

FaceDeer@fedia.io · 2 days ago

I think it’s worthwhile to show people that views outside of their like-minded bubble exist. One of the nice things about the Fediverse over Reddit is that the upvote and downvote tallies are both shown, so we can see that opinions are not a monolith.

Also, engaging in Internet debate is never to convince the person you’re actually talking to. That almost never happens. The point of debate is to present convincing arguments for the less-committed casual readers who are lurking rather than participating directly.

FaceDeer@fedia.io · 2 days ago

Doesn’t work, but I guess if it makes people feel better I suppose they can waste their resources doing this.

Modern LLMs aren’t trained on just whatever raw data can be scraped off the web any more. They’re trained with synthetic data that’s prepared by other LLMs and carefully crafted and curated. Folks are still thinking ChatGPT 3 is state of the art here.