Ghazi Khan
Ghazi Khan I am an open source developer and I love building simple solutions for complex technical problems.

Stable Diffusion 3, OpenAI Sora, and More Exciting Developments!

Stable Diffusion 3, OpenAI Sora, and More Exciting Developments!

Remember that time your grandma swore her smartphone was secretly listening to her conversations and selling them to the marketing companies? Typical sweet grandma. Well, this week in AI, the advancements might not be quite as nefarious, but they’re certainly keeping things interesting. Buckle up, because we’re about to dive into the whirlwind of the past seven days in the ever-evolving world of artificial intelligence.

Stable Diffusion 3 Arrives, Stability AI has released Stable Diffusion 3, solidifying its position at the forefront of text-to-image generation with even more impressive capabilities. It utilises a new architecture called a “diffusion transformer,” similar to the one used in OpenAI’s Sora, but with modifications for image generation tasks. It also employs a technique called “flow matching” for training, making the model more efficient and computationally cheaper to run.

OpenAI Unveils Sora, Its first video generation tool, marking a significant step forward in the creation of realistic AI-generated videos. It allows users to control the content and style of the generated video based on text prompts, while maintaining temporal coherence throughout the sequence! Sounds like a dream come true for aspiring filmmakers, or a potential nightmare for copyright lawyers. OpenAI shared dozens of example videos on their twitter and Instagram.

GPT-4 hangover anyone? The week kicked off with Microsoft and Nuance launching Dragon Ambient eXperience (DAX), a workflow-integrated application that utilizes the much-talked-about GPT-4 for automated clinical documentation. Sounds fancy, right? But here’s the thing, while AI streamlining healthcare workflows sounds like a dream come true, it also raises concerns about potential biases and the human touch being sidelined in such a critical field. So, the question remains: are we trading efficiency for accuracy, or can we have both?

Microsoft’s on a roll! They also made GPT-4 available in preview through their Azure OpenAI service, and guess what? Bing is getting a makeover too, with AI-powered image creation and updated knowledge cards. It seems the search engine giant is determined to keep up with the ever-changing AI landscape. But hey, at least you won’t get lost in a labyrinth of search results anymore, right? Maybe.

Speaking of getting lost, did you hear about the new LLM-based browsers actively being explored by various companies? Last we heard of something like that was from Opera. It boasted AI-powered features like prompt suggestions and access to popular language models like ChatGPT and ChatSonic. Now, while this might sound like a productivity booster for the multi-tasking extraordinaire, but looks more like an additional chat box hung to a window.

This might change in future though, raising intriguing possibilities for enhanced user experiences, personalized search results, and potentially, even ethical considerations regarding information access and control.

But wait, there’s more! Researchers at NVIDIA unveiled a slew of cloud-based AI tools, including NeMo, Picasso, and BioNeMo. These bad boys are designed to make AI development more accessible and efficient.

1 NVIDIA AI platform stack layer

Now, don’t get me wrong, this is a significant step forward in democratising AI, but it also raises concerns about the potential misuse of these powerful tools by individuals with less-than-noble intentions. Remember the saying from last week, “with great power comes great responsibility”? Yeah, that applies here too.

Phew, that’s a lot to unpack in just one week, right? It seems the world of AI is moving at breakneck speed, and it’s our job to stay informed, engaged, and, yes, even a little bit skeptical. After all, as Albert Einstein famously said,

“The important thing is not to stop questioning. Curiosity has its own reason for existing.” So, what can you do? Stay curious, folks! Read articles like this one, delve deeper into the research, and don’t be afraid to ask questions. The future of AI is being shaped right now, and it’s up to us to ensure it’s a future that benefits all of humanity, not just the tech giants and the smartphone overlords (hopefully, they’re still just keeping us connected).

Exciting New Developments This Week

  • Qualcomm unveils AI models optimised for smartphones and laptops: Bringing AI to the edge, Qualcomm announced over 75 AI models designed specifically for efficient operation on smartphones and laptops.
  • Samsung expands Galaxy AI features to older devices: Democratising AI access, Samsung announced the expansion of AI features from their flagship Galaxy S23 series to older models.
  • Gemini AI’s controversy: Facing ethical scrutiny, Google’s recently launched large language model, Gemini, sparked controversy for potential biases in its image generation capabilities. Users reported an underrepresentation of people of color and a tendency to favour white individuals in certain image prompts. Seems even AI struggles with diversity quotas.

Research Worth Exploring

Metaculus - A Platform for Predicting the Future of AI: This platform allows users to make and trade predictions on various aspects of AI development, fostering informed discussions and collective intelligence.

The Alignment Forum, Exploring the Risks and Benefits of Advanced AI: This forum brings together researchers, policymakers, and the public to discuss the potential risks and benefits of advanced AI, promoting responsible development and governance.

Large Language Models and the Future of Work: This research by Bender et al.(2023) explores the potential impact of large language models on various aspects of the workforce, highlighting both opportunities and challenges.

Subscribe to my weekly newsletter on LikedIn :)

Remember, this is just a glimpse into the ever-evolving world of AI. Stay curious, stay engaged, and keep the conversation going!

comments powered by Disqus