Google is considering Gemini AI project to tell the story of people's lives using phone data, photos

[ad_1]

A team at Google has proposed using artificial intelligence technology to create a “bird’s eye” view of users’ lives using mobile phone data such as photos and searches.

Named “Project Ellman” after biographer and literary critic Richard David Ellman, the idea would be to use Gemini-like LLMs to retrieve search results, find patterns in user photos, create a chatbot, and do “previously impossible tasks.” Will be used to answer the questions. Copy of the presentation seen by CNBC. It says Elman aims to be “the storyteller of your life.”

It’s unclear whether the company plans to produce these capabilities in Google Photos or another product. Google Photos has more than 1 billion users and 4 trillion photos and videos, according to the company’s blog post.

Google announces OpenAI competitor Gemini 1.0

Project Elman is one of several ways Google is proposing to build or improve its products with AI technology. On Wednesday, Google launched its latest “most capable” and advanced AI model, Gemini, which outperformed OpenAI’s GPT-4 in some cases. The company plans to license Gemini to a wide range of customers through Google Cloud for use in their own applications. One of Gemini’s distinguishing features is that it is multimodal, meaning it can process and understand information beyond text, including images, video, and audio.

a product manager for Google Photos from Project Ellman were presented with Gemini teams at a recent internal summit, according to documents seen by CNBC. He wrote that the teams have determined over the past few months that large language models are the ideal technology for turning someone’s life story into reality.

The presentation said Elman can draw context by using biographies, past moments, and later photos to describe a user’s photos in more depth, rather than “just pixels with labels and metadata”. It proposes to be able to identify a series of moments such as the university years, the Bay Area years and the years as parents.

“We can’t answer tough questions or tell good stories without a bird’s-eye view of your life,” reads a description accompanying a photo of a little boy playing with a dog in the dirt.

“We study your photos, looking at their tags and locations, to identify a meaningful moment,” the presentation slide reads. “When we step back and consider your life in its entirety, your broader story becomes clear.”

The presentation said that large language models can predict moments like the birth of a user’s child. “It can use the knowledge gained from the top of the LLM tree to predict that this is Jack’s birth, and that he is James and Gemma’s first and only child.”

“One of the reasons why LLM is so powerful for this bird’s-eye view is that it is able to take unstructured context from all different heights of this tree, and use it to improve the way it understands other areas of the tree. is,” a slide consisting of depictions of “moments” and “chapters” from the user’s various lives.

The presenters gave another example of determining whether a user had recently gone to a class reunion. “It’s been exactly 10 years since she graduated and it’s full of faces we haven’t seen in 10 years, so it’s probably a reunion,” the team speculated in their presentation.

The team also demonstrated “Elman Chat” with the description: “Imagine opening ChatGPT but it already knows everything about your life. What would you ask it?”

It displayed a sample chat in which a user asks “Can I have a pet?” To which it replies that yes, the user has a dog that wore a red raincoat, then tells the dog’s name and the names of two family members with whom it is often seen.

Another example from the chat was a user asking when was the last time their sibling visited. Another asked him to list similar cities where he lives because he’s thinking of relocating. Elman presented answers to both.

In other slides shown, Elman also presented a summary of the user’s eating habits. “It looks like you’re enjoying Italian food. There are several photos of pasta dishes, as well as a photo of pizza.” It also said the user was enjoying new food because one of his photos featured a menu with a dish he didn’t recognize.

The technology also determines which products a user is considering purchasing, their interests, work and travel plans based on a user’s screenshot, the presentation said. It also suggested that it would be able to know their favorite websites and apps, citing Google Docs, Reddit and Instagram as examples.

A Google spokesperson told CNBC: “Google Photos has always used AI to help people find their photos and videos, and we’re excited about LLM’s potential to unlock even more useful experiences.” Excited. This was an early internal exploration and, as always, should we decide to introduce new features, we’ll take the time necessary to make sure they’re helpful to people, and as our top priority. Be designed to protect the privacy and security of users.

Big Tech races to create AI-powered ‘memories’

The proposed project Elman could help Google create more personalized life memories in the arms race between tech giants.

Google Photos and Apple Photos have served up “Memories” for years and created albums based on photo trends.

In November, Google announced that with the help of AI, Google Photos can now group similar photos together and organize screenshots into easy-to-find albums.

Apple announced in June that its latest software update would add the ability to its Photos app to recognize people, dogs and cats in their photos. It already sorts faces and allows users to search for them by name.

Apple Also announced is an upcoming Journal app that will use on-device AI to create personalized suggestions to inspire users to write excerpts describing their memories and experiences based on recent photos, places, music, and workouts. Will use.

But Apple, Google and other tech giants are still grappling with the complexities of properly displaying and recognizing images.

For example, Apple and Google still avoid labeling gorillas, after reports in 2015 found that the company mislabeled black people as gorillas. A New York Times investigation this year found that Apple and Google’s Android software, which underpins most of the world’s smartphones, turned off the ability to search for primates for fear of labeling a person as an animal. Have given.

Companies including Google, Facebook And Apple has added controls to reduce unwanted memories over time, but users have reported that they sometimes still appear and require users to toggle through multiple settings to reduce them. Is.

Don’t miss these stories from CNBC Pro: