Google I/O 2024 Conference: Nothing Short of Incredible

May 20, 2024By Kyle Steinberg

Topics: Artificial Intelligence

Google I/O '24: Unveiling the Power of Gemini

On May 14, 2024, Google hosted its annual I/O developer conference, unveiling a plethora of advancements to Gemini, other AI models, & Project Astra. This blog post dives into these updates made at the conference, outlining the capabilities of Gemini and its vast potential to transform various aspects of our lives.

Gemini 1.5 Pro: A Multimodal Powerhouse

Standing at the forefront of Google's AI initiatives is Gemini, a revolutionary AI model unlike any other. Unlike its predecessors, Gemini was built from the ground up as a multimodal AI, seamlessly processing information from various formats like text, images, videos, documents, and code. This empowers users with unparalleled search capabilities, enabling them to ask complex questions and receive informative answers that integrate insights from all these formats. Other AI's like OpenAI, need to bounce from the GPT model for text to Dalle for image generation and so on. Gemini 1.5 Pro is the next evolution to the Gemini model launched last year.

Gemini 1.5 Pro allows up to a context window of 2 million tokens, this is more than 10x what the latest GPT model can handle. But what does that mean to everyday users? In a nutshell, this means you can have incredibly long conversations without the AI "forgetting" what you talked about through your interaction. 2 million tokens roughly equates to 153,000 PDF docs to be uploaded once for analysis. While that number far exceeds what any normal user may need, it does allow you to keep a conversation going for quite a long time. A great example shown during the conference was about meal prepping. The demo showed a user getting a meal plan together for a week with 3 meals a day. The plan can be altered from meal by meal to day by day, all while Gemini does not forget what it gave you before or additional info you provide throughout the conversation. This large context window allows you to have real conversations without being capped out rather quickly.

Revolutionizing Search with AI

One of the most impactful applications of Gemini is its integration into Google Search. Gone are the days of rudimentary keyword searches. With Gemini, users can leverage natural language queries, including those incorporating visuals, to unearth a wealth of information. Imagine searching for a specific recipe and being presented with instructions, instructional videos, and a grocery list tailored to your preferences.

Going back to the searches themselves, planning a trip with an itinerary would normally require 100+ queries for you to get flights, hotels, restaurants, activities, etc. Now it can all be done with a single search. Gone are the days of searching for something like "flights to Boston". Now with the Gemini upgrades, your searches can be more like "I'm planning a 4-day trip to Boston with my family of 4. We want to stay in a hotel by the water or centered around the things we want to do. Some of the things we want to do are catch a Red Sox game, walk the Freedom Trail, a visit MIT. Also, recommend the best restaurants around these locations. My wife likes seafood, and my kids like pizza."

Gemini can handle this entire query in one fell swoop. Find flight times and prices, hotel locations and prices, plan your itinerary to ensure to see everything you want to, and provide restaurant options, all at once. Even if you wanted to alter some like instead of a Sox game you wanted to see the Bruins the itinerary will shift around this new activity and replan your trip to account for that game and restaurants to go to.

Unlocking Creativity through Generative AI

While Gemini is multimodal, Google also announced more specialized AIs for creative content generation. The first one we'll talk about is Imagen 3. This is the main AI for photo generation. Imagen 2 was great but like all other AI image generations, it was missing a couple of key things. The first was more realistic aspects of images. Something that finer details like fur or hair would give some splotchy aspects that didn't really look like hair but more like a water paint effect. Now you can see hairs down to each strand. The second thing that plagued image generators was text overlaying on the image. More often than not you would get misspells, or even weird letters that would merge with other letters creating weird lettering and spacings. Now with Imagen 3 that is an issue in the past.

The second big AI upgrade is called "Veo". This is a video AI that is created from text, images, or existing videos. Veo wraps all of the different video tools Google had under a single model. Veo can craft video, enhance, and even extend videos you need/want to generate. Say you started a video but stopped after 10-seconds but you 20 or 30 seconds worth. With the extend feature you can take that baseline 10 second video and extend it by seconds, minutes, or hours. Veo was demoed with film director Donald Glover who is in the process now creating a short movie with Veo.

AI Assistants for a Streamlined Future

AI assistants are poised to become an integral part of our daily lives, and Google envisions Gemini playing a central role in their development. Imagine a virtual assistant that can not only understand your requests but also anticipate your needs and complete tasks on your behalf. From scheduling appointments to processing returns, Gemini-powered assistants promise to streamline our lives and free up valuable time.

This AI Assistant is more than just summarizing an email string. It can read attached docs, and understand the context of the strings to craft responses and take actions like booking a meeting, returning items, creating spreads, etc. All with a prompt a click of a button. Gemini is aiming at simplifying our day-to-day interactions and streamlining processes that would be minutes to hours long in a matter of seconds. The Gmail, to Google Sheets, to Google Drive interactions, and much more all happen seamlessly without having to open this or that, or create this and that then link them. Say your receipts get emailed to you. You can now have Gemini take receipts that come later and automatically have them put into a Sheet for you without lifting a finger. 

Empowering Educators and Learners

The educational landscape is also set to be transformed by Gemini. With Gemini 1.5 Pro coming to Google Classroom and NotebookLM, teachers and students can now craft lesson plans or learn them in a way that suits the way they learn best. Normally when asked Gemini a math question it would give you the answer. With these updates, it provides a way to teach a student how to think about, approach, and solve a problem. 

NotebookLM allows teachers and students to put textbook chapters, worksheets, and more into a single place and provides things like study guides, FAQs, and even practice quizzes for students to use before the real thing. Audio Overview is at the heart of NotebookLM. With Gemini being multimodal it can read and see all these materials and create a conversation (lecture) that students can think and interact with like in a classroom. Even better it can steer the conversation. In the demo during the conference the AI was teaching about Newton and the 3 Laws of Motion, the user then asked about the 3 laws in relation to their kid's favorite sport of basketball. The AI then began to explain how the laws work with basketball as the source of inspiration. This radically changes how kids can learn concepts in ways they identify with most. Wish I had this when I was in school.

A Responsible Approach to AI

While the potential of AI is undeniable, Google emphasizes its commitment to responsible development. The company highlighted its efforts to ensure that AI remains helpful, accessible, and unbiased, benefiting everyone.

The biggest issue we've seen is people's images, voices, and copy used and twisted for more hurtful/destructive uses. With SynthID 2, AI-generated content no matter the format is digitally watermarked as a way to help push this responsible AI use. Being able to identify these deepfakes and prevent them from being used inappropriately is paramount to protect people as well as prevent things like disinformation in the future.

Another way they are protecting users is by not using your information to train their models. This is especially important for businesses that are trying to improve systems/processes for their company that require private/sensitive data to provide context to the AI. Protecting this type of information that others do not is vital for companies and individual users and allows it to be a more mass-adopted method because the future of AI in our daily lives is unavoidable.

A New Era of AI-powered Possibilities

Google I/O '24 painted a vivid picture of a future powered by AI, with Gemini acting as a cornerstone. From revolutionizing search to fostering creativity and streamlining tasks, the potential applications of this groundbreaking technology seem limitless. As Google continues to refine Gemini and integrate it into its products, we can expect to see even more transformative advancements in the years to come.

I highly encourage all of you to watch the full Keynote below to see what's coming and start to ideate use cases for yourself and whatever profession you are in. Google I/O 2024 Keynote

A closing note is that we should embrace AI. There is no stopping the AI train, the more we resist as individuals the more we fall behind. Staying up to date with AI and knowing how it works and how to use it ensures that you are at the forefront of this emerging tech. As much as we don't like want or like the idea of AI replacing jobs, it will. And those that implement, use, build, really anything with AI are much more valuable and far less likely to be replaced by it. Focus less on the negatives and more on the positives it can bring to our lives.

