Discover more from AI Mistakes
Predicting the Future is hard. Create it Instead!
Get a better perspective: zoom in and out
In this edition:
History: Microsoft Tay (was this an AI mistake?)
Coding, Tools, and Experiment
Large Language Model Application Architecture
Over the last several years, I have been narrowly focused on understanding, using, and applying the latest tech to enable business. Whether it’s managing people, technology, or practices, they all play a pivotal role in creating and sustaining value.
New tech is exciting! Imagine the incredible possibilities that can be created, and the world is your oyster! Getting into this technology now means that just a very small percentage of the world even knows it exists! Aside from the direct potential for building a business, the wonderful thing about Generative AI is its potential for unlocking creativity. My personal experience with generative AI began when Stability AI released Stable Diffusion. Since then, the tech has evolved rapidly in both written and visual generation. It is through the experiments with these new AI systems I have discovered my own creativity again. What am I saying is this: if you think you’re not creative, go play. If you are creative, you already know what to do and the potential is incredible!
I still use stable diffusion for some specific use cases today, but I also have a subscription to Midjourney that I use most often. Recently, Midjourney released version 5.2, and in this release, they have added a fascinating new “Zoom out” feature.I used the new feature for the image header today and found it to be a nice addition. Earlier this year, they shared their intention to get to version 6 and to release an API version. I fully expect that they will continue to create value with this tool, and it’s opened up the possibilities for people like me to create images in a way that was not possible before. This is just the still images! I’m already seeing new solutions that enable people to create new content made for film and tv, animation, immersive 3D experiences, and more.
CREATE. There’s that word again! It’s difficult to predict the future, but our minds are built to make predictions, and so here is mine: we are headed for the next generation of the modern-day Renaissance. New tools that enable creation arrive every day. New tools that make technically complex solutions available to non-technically trained persons are being built daily, not to mention tools like Adobe Photoshop that can extend new AI capabilities into existing tools. When you combine rapid communication with the ability of individuals to create with tools that would require hundreds of people with global scale collaboration and consumption, creation IS the future, and it’s ours for the taking.
ADA-02 now costs have dropped 75% from the previous price! You can get started with the API calls easily today, and you don’t need to use the best models to see great results.
AI is continuing it’s integration march, this time for 3D creators. This tech is pivotal to creating enough content for immersive platforms.
Developers can now describe functions to
GPT4 andhave the model intelligently choose to output a JSON object containing arguments to call those functions. This is a new way to more reliably connect GPT's capabilities with external tools and APIs. This will extend the functionality and use in using generative AI capabilities to build technically complex systems without the user needing to have deep technical knowledge.
The name I’ve taken up for this enterprise is AI Mistakes. I really do want to stay positive and help people learn, but inevitably people will take shortcuts and expect these new tools to do all the work for them. When I happen upon these stories, I like to put them in the “AI Mistakes from Around the Web” section on the ai-mistakes.com website.
These AI tools are great but no substitute for vetted research, don’t make the mistake these guys did in court last week.
History: Microsoft Tay
In March 2016, Microsoft released a new experiment. Tay was a chabot designed to engage people through via Twitter. The account would take on the persona of a teenage girl and interact with anyone that engaged the twitter handle in tweets or via direct-message.
The great thing about this project was the conglomeration of social media, natural language process, machine learning, and some playful banter provided by professional comedians. The interface was handled by the social media app and thus, the project could focus on the input/output text. It was intended to be fun and playful, with the user experiencing entertainment and the team behind it getting some excellent data from their experiment.
In terms of how the headlines described… It didn’t go well.
A few hours into the experiment that started on March 23, 2016, Tay’s tweets contained some disturbing and inflammatory statements. The offensive content led to the team suspending the Twitter account before it had even completed a full day of interactions. Tay had more than 95,000 tweets. That’s well more than a human could manage in a day, so in terms of scale, it was a great success at getting attention! Despite the volume, or perhaps because of malintent driving the volume, the sheer volume of negativity was too much for the experiment to continue.
Firstly, I’d like to applaud Microsoft for building something like this in such a public way. It has provided ample learning opportunities, and it didn’t stop at running the experiment. In the post-deployment investigation, Microsoft shared much about what they discovered. While there were definitely some trolls involved and playing the role of negative provocateurs, we learned some vital lessons, and everyone was able to benefit from the lessons.
The first lesson: System outputs need guardrails.
We can take from this is that there needs to be some amount of guardrails and limitations that act as a circuit breaker or safeguard. AI systems, while enabling some incredible things, aren’t the same as the computing systems that we’ve built in the past. AI applications are no longer straightforward calculators; they are much more and should include some considerations that go beyond providing an answer.
The second lesson: System inputs need scrutiny
Next, not only should we put careful boundaries in place for what the system puts out, but we should also understand what goes into the system. The old “garbage in, garbage out” still applies. One critical element in the construction of Tay was the design to incorporate the conversation inputs into the model for use later. The trolls discovered some ways to manipulate the system to amplify the input of negative ideas. Then they set it loose on the internet, encouraging others to leverage both the negative inputs coupled with a “repeat after me” function. There was no system in place to prevent these negative statements from being incorporated into the responses that were used in subsequent interactions. An analysis of the input sentiment could have dramatically reduced the severity of the outputs.
The third lesson: monitoring is more than “is it running?”
Perhaps most importantly, when systems are intended to be highly interactive with people, additional efforts must be taken to monitor the conversation. In the past, knowing that a service was running was enough to monitor the system. In the past we built systems that would could check for a simple "systems nominal”. With these new AI interactive systems, we now need to provide additional inspection “systems nominal and not generating destructive answers”.
Microsoft didn’t stop learning with Tay. A bit later in 2016 they tried again, this time with a new version called “Zo”. This new bot included new guardrails and exit functions that would result in the conversation ending when a sensitive topic or provocative interaction occurred. It didn’t take long to improve the system, but the lesson was hard to learn in public. Now, we are able to take these lessons and apply them as ever more interactive systems are built. Some builders will need to make these mistakes on their own, but with a bit of proper attention and study, we can learn from the mistakes of others and skip this lesson with personal experiences.
Tay taught us many important lessons on the road to building a system that had greater utility. Others have already taken the learnings, and among them, OpenAI. Many people have poked fun at ChatGPTs abundant use of “as an AI assistant….” but it’s undeniable to that they built their ChatGPT product with these lessons in mind, and now we can too!
Autonomous Agents and Assistants
Are you fascinated by how AI can do things for you and multiply your effort? Many of the people working on the use of Generative AI in applications are quite excited by the development of agents. I’ve had a chance to work with the a bit myself. In one of my earlier YouTube videos, I did a quick (but disappointing) demo of AgentGPT.
Even with the experiments not having an inspiring result, this is an exciting space, and having worked on automation projects throughout my career, I know automation can prove to be a significant force multiplier. I expect research and writing (both code and prose) are the biggest value use cases for these assistants to write now because that helps us overcome major frictions in the creation process. Many new AI projects have already been released that take advantage of automation to confirm or test the output of the generation to confirm the rightness or accuracy of the output. Good prompt engineering practices can be summed up by simply asking, “are you sure this is the right answer?” So clearly, there is some utility in automated follow-up queries.
The three major agent (assistants) based projects I am familiar with and following are AutoGPT, GPT-Engineer, and BabyAGI. In my research this week, I came across this wonderful Overview. It covers some of the major POCs and projects that have come about and also provides a breakdown of the stages that comprise an agent, as well as some case studies and a bit of concept code used for testing them out. I’m excited about GPT-Engineer as a coder and I expect we will see both good and bad outcomes from non-expert coders unleashing their creativity on the internet. Still, if the generation of code can be automated, DevSecOps and SRE practices can be as well.
We’re still early days for this kind of application of LLMs, but it won’t stop enterprising coders from experimenting, so it’s definitely a space worth watching.
Large Language Model Application Architecture
Well know Silicon Valley venture capital firm a16z, founded by Ben Horowitz and Marc Andreessen, has been amidst some of the most cutting-edge startups in the world. This month they released an excellent reference architecture for the emerging LLM application stack. For someone like me who loves getting their hands dirty in understanding the trade-offs that must take place in building and scaling high-performance applications, this is an excellent read for those of you responsible for the more technical aspects of building LLM capabilities to your existing offerings and services.
a person using a crystal ball, it dominates the frame to show inside it an image of a painter using a paintbrush to create a digital video game that is clearly pixelated and is a familiar pop culture icon like mario or zelda