Will this Yudkowsky tweet on AI babysitters hold up by Feb 2028?

Plus

104

Ṁ13k

2028

49%

chance

ALL

See tweet:

https://x.com/esyudkowsky/status/1760374390636913045?s=46&t=62uT9IruD1-YP-SHFkVEPg

In 2-4 years, if we're still alive, it'll be interesting to see what happens to kids being de-facto raised by tablets displaying multimodal LLM babysitters.

("No, honey, you can't put Elsa away if you get angry; Elsa has to alert me if she can't see you anymore.")

This resolves based on the spirit of the question. I will not bet in this market to remain objective and I’m happy to clarify questions and refine the resolution criteria. If the resolution is controversial, I’ll confer with manifold superusers and admins.

This market resolves exactly 4 years after the tweet (Feb 21, 2028).

Gonna write down some cases that could cause this market to resolve YES, since the description is obviously vague. Obviously this does not require a world where EVERY kid has an LLM babysitter to resolve YES:

-There is a large company that sells AI babysitting software, and there are many parents that use it. It’s clearly a real product, and not just a niche thing that AI enthusiasts try out on their kids, and it ~works~ at least somewhat like a babysitter (as in it is not just equivalent to a tv show for parents to put on for their kids to entertain them). The kids don't need to be ~exclusively~ raised by the AI, but it should be "de-facto raising" them in some sense of the term, which is a moderately higher bar than just keeping them company.

-A major chatbot (the equivalent of Gemini or ChatGPT in 4 years) has an interface viable for babysitting kids and many parents use it for such, to the extent that there are Atlantic articles or whatever about parenting with AI, twitter discourse on the subject, OpenAI/google advertise these use cases in their marketing, etc.

This question is managed and resolved by Manifold.

#AI

#Technical AI Timelines

#OpenAI

#Children

Get

1,000

and

3.00

26 Comments

95 Holders

171 Trades

Sort by:

I think babysitters often need to physically interact with kids, so I don't think mere software will be useful enough to really happen. I don't think we'll have appropriate babysitter robots by then either.

bought Ṁ100 NO at 50%

reposted

My thoughts as market creator (remember I cannot bet):

I think these market odds seem fairly reasonable currently. I expect the most likely scenario for this to resolve YES would be:

-OpenAI or a competitor rolls out a more socially-oriented model in the next couple years: something like a therapist/companion bot, which is then advertised and used for tasks with children. This become widely used and reported on for childcare.

Does school count? Some people disparage school as "babysitting", or more positively say that schools are "raising" kids. It's likely that kids in Feb 2028 will spend >1 hour a day working with LLMs.

Does it count as babysitting if the parent is in another room in the same building, house, or does it need to babysit them "home alone"?

Also, does it count as babysitting if the children are teenagers, for example?

@MartinRandall

-The tweet's text mentioned a situation where the parent is at home, so yes it counts if the AI is doing tasks commonly associated with babysitting regardless of if the parent is there.

-I mean, I don't think most teenagers have babysitters period, so I'm not sure why one would use an AI to fill a role that doesn't currently exist, but, like... maybe?

@benshindel oh, I thought Elise's parent was talking to her through remote speakers. Or before or after. I think having a babysitter while a parent is still in the same building is not a central example of babysitting.

I think "entertaining" and "keeping them company" is the bar for human babysitters, I don't expect them to "raise" children, that's the job of parents and others. A human who is hired to raise a child might be called a nanny.

Given that AI will be cheaper than a babysitter I absolutely would expect a teenager to use such an AI if targeted at their age range.

I have ideas for how to make this market better defined if you're interested?

People sure are betting against my limit order at 50%, which does leave me a bit puzzled; I had thought I was making a relatively obvious sort of prediction here -- that AI models are not far off from being able to watch a child or trying to keep them occupied. From there to some parents trying to use them for the sake of gaining a few more minutes to themselves, seems like an obvious step. Maybe very wealthy parents or ones with giant poly group houses won't have any use for the tech, but a lot of other harried parents will. I wonder if I'm missing something about what people think the resolution terms are, which means that people are asking about some different outcome than the one I had in mind.

@EliezerYudkowsky If I can be more clear about the resolution criteria, feel free to let me know! I don’t want the bar to be too low (like, “chatGPT on a tablet keeps kids occupied and talks in the tone/syntax of a babysitter” seems like a >90% likelihood and a boring market) or too high (like, “AI companies have stolen the majority of market share from babysitters within 4 years” seems like a <10% likelihood and also a boring market)

It’s a complex topic, and normally I’d be happy to completely defer to your judgment on the resolution criteria, but also it’s YOUR prediction and you’re trading in the market so I don’t know if that’s a good idea 😆

But I think/hope I understand what you were getting at with the tweet and will resolve in the spirit of your prediction!

@benshindel "There is a large company that sells AI babysitting software, and there are many parents that use it." What does many mean? I bet no because I imagined that meant some significant fraction of all parents - e.g. hundreds of thousands of parents. But Eliezer is saying 'some parents', which tracks the quote more closely. I think the chance of 'some parents' is vastly higher.

@AmitAmin I’m not gonna give an exact number because I want to resolve the question in the spirit of the tweet and not have ppl arguing about specific number, but I think both of us mean more than, say, a few dozen, but less than (or rather, not requiring more than) 10% of the population

@benshindel the spirit of the tweet is vague so as to claim victory whatever happens. This is standard short-term futurism.

If in 4 years "ChatGPT on a tablet keeps kids occupied and talks in the tone/syntax of a babysitter", perhaps also being able to page a parent who is working upstairs, I expect Yudkowsky to claim this as a successful prediction.

@MartinRandall this market has an inherently subjective resolution as there's really no way to anticipate every possible edge case for a market like this. I will be resolving this market, and I am not betting in this market. I don't think I'm particularly biased w.r.t. this topic (for instance, I hold both large YES and NO positions on various other AI capabilities markets) and as I said, if the resolution is difficult, I will confer with other Manifold users and admins. If you don't think I can resolve this market without bias, I would advise not betting too heavily.

@benshindel

there's really no way to anticipate every possible edge case for a market like this

I think we can read the words of the tweet, determine what objective forecast these words most closely correspond to, and then make a market on whether that forecast will be true.

You wrote:

I’m happy to clarify questions and refine the resolution criteria

I think that refusing to specify a number between 60 parents and 100,000,000 parents is not helpful. This is not some weird edge case that we can't possibly anticipate. It's very common for new technology to be adopted by between 60 and 100,000,000 people.

It's not about bias, it's about not knowing what causes this market to resolve YES or NO.

So I guess try to reason this out.

Multimodal AI can clearly do the task in future versions.
The task goals are
A. ensure the kid is accounted for and not in obvious danger. Multimodal LLM, probably using a cheaper edge network to track the subject, where periodically the big network checks a few frames to make sure the edge network wasn't fooled.
B. Try to keep the kids attention. Generate videos from scratch or more likely select games and videos the kid is predicted to like.
C. Try to keep up with a "curriculum". Some cool ways to do this, like the AI reprogramming a game on the fly to insert a quiz on arithmetic or modify an existing video with educational segments in it. "Help Boba Fett calculate the next hyperspace jump!" Machine is trying to maintain some goal of quiz scores on content by sneaking in educational content.

I think it's doable but only a large company can host a model powerful enough to be reliable at this by 2028, and there are copyright issues and liability issues. "Model failed to report a kidnapping or the house was on fire".

Voting no because of the issues. Technically very doable, just isn't happening because law and liability is many oom slower.

Note most tech companies outright refuse to knowingly have users under 13.

If parenting in general is a very rare activity (i.e. not many are having children), but those who do parent primarily rely on AI babysitters, what would this resolve?

@12c498e This would resolve YES

bought Ṁ10 NO

Parenting is an inherently conservative activity. My expectation is that while the technology will be sufficiently advanced, the vast majority of parents will not feel comfortable with it

@yaakovgrunsfeld Parents gave tablets to toddlers pretty quickly after they arrived. I'm not sure we're all that conservative.

It's already happening lol (not implying anything about scale)

@jacksonpolack oh ya I saw that he got one of those! I hope he posts some good content about how it's going! But I think this doesn't meet the bar of "babysitter" yet (although I don't know too much about this robot). I think it's still just a toy or entertainment for the kid at this point.

Hey, thanks everyone for the comments!

1) If this market becomes popular, I'll be happy to make one resolved the same way for the earlier range of the question (after 2 years). I named the question "will this hold up" to line up with the moth video market, but yes, for this manifold market, it would resolve YES even if this happened in 1.5 years (before the earlier range in the tweet).

2) I fully intend to resolve this in the spirit of the question, so like... if there are, for example, like 3-4 sets of outlier AI enthusiast parents that have been using an AI to de-facto raise their kids, this won't resolve YES. But it also won't require, for example, 10% of American kids to be raised by LLMs to resolve YES. Enough that it's obviously "something happening" would be enough.

Other details like whether the AI babysitter is on a tablet vs a phone or Alexa-like device don't seem particularly relevant to the spirit of the question, but obviously if there's no tablets being used and we have androids instead, I guess it wouldn't resolve YES, although like Eliezer said this seems extremely unlikely that this would happen without an intermediate phase (and if there are ubiquitous android AI babysitters within 4 years... well I don't think ppl will be caring about this Manifold market's resolution haha).

About the further definition in text: I don't expect Gemini or ChatGPT to allow this use for legal liability reasons.

@EliezerYudkowsky I agree they probably won't explicitly create such a tool due to liability reasons, but I think it's quite plausible they'd have ambiguous marketing materials of like, a kid talking to an AI that is giving it fun activities to do, or a parent happily watching their kid playing with an AI from outside their bedroom.

@benshindel I predict the big legally-vulnerable companies will try to actively shut down that use, the way they've made a halfhearted attempt to shut down asking for medical advice and a way more serious attempt to shut down porn -- they will prudently not want the publicity the first time something Bad happens to a kid being babysat by an AI.

-Is "tablets" a key component of the question? If it were hypothetically a Jetsons-style humanoid robot, would that count?

-Is "LLM" a key component of the question? If there exists an AI babysitter that does not have something reasonably describable as a language model as one of it components, would that count?

-How high is the standard for "de facto raised"? Do any children today meet the requirements of being "de facto raised" by a television or non-AI tablet?

-Would an AI-powered video baby monitor that watches the child and recognizes its behavior and activities and alerts the parent if a problem arises, but does not interact with the child count?

@Nick6d8e

Arguably a full Jetsons robot would be an importantly different scenario. Let's say that counts as my being wrong if there was no tablet stage first, which would quite surprise me.
If the AI babysitter is not communicating with the child, that counts as my being wrong. Obviously I already had in mind a multimodal model, so it does not need to be exactly a conventional modern old-school LLM.
If there's an AI tablet talking to lots of children, for as long as some children are left to watch TV today, and more parents are paying even less attention because of the presumption that the AI will alert them if the child gets into trouble, then yeah that's the scenario I had in mind. If that makes the prediction seem like too much of a slam-dunk, well, it's not like I'm trying to run around making bad predictions! I thought the image one was obvious but a lot of people gave me credit for it, so now I'm trying more predictions that seem that obvious.
Nope, the scenario I have in mind is that the AI is talking to the child and keeping them busier that way.

Related questions

Related questions