Amazon introduced Amazon Nova on Tuesday, a new generation of foundation model that is expected to lower the cost and improve the speed of tasks involving generative AI.
Amazon claimed that the new Amazon Nova model will enable users to “analyze complex documents and videos, understand charts and diagrams, generate engaging video content, and build sophisticated AI agents.”
According to the e-commerce giant, users will now be able to input texts, images, or videos for textual output, meaning the new technology can analyze more than words.
The three understanding models
The technology is based on three understanding models, although Amazon insisted that a fourth is expected soon.
The first of the three is Amazon Nova Micro, which Amazon claimed allows for a low-cost and low-latency textual output. The model processes inputs up to 300,000 tokens in length and can analyze multiple images or up to 30 minutes of video in a single request, Amazon claimed, noting that it was also capable of using techniques like model distillation.
The second, Amazon Nova Pro, is capable of processing up to 300K input tokens. Amazon claims it efficiently utilizes multimodal intelligence and agentic workflows that require calling APIs and tools to complete complex workflows. This model understands visual questions, and its capabilities include visual question answering and video understanding.
Amazon Nova Premier, the third model, is described by Amazon as the “most capable for complex reasoning tasks” and “teacher for distilling custom models.” Little is known about the model, but the company announced its expected release in early 2025.
All three models were said to have advanced skills in Retrieval-Augmented Generation (RAG), function calling, and agentic applications.
Amazon stressed the technology’s customization capabilities as a key selling point, noting that a user could “start with a high-quality foundation and adjust it to fit your exact needs. You can fine-tune the models with text, image, and video to understand your industry’s terminology, align with your brand voice, and optimize for your specific use cases.”
Adding to Nova's creative capabilities, the company explained that with Amazon Nova Canvas, users could generate “studio-quality” images with precision control over style and content. With Amazon Nova Reel, a second model, users would be able to generate short videos through text prompts and images.
The technology can be used in 200 languages, including Hebrew, which Amazon said would prevent users from “worrying about language barriers or maintaining separate models for different regions.”
While generative AI has been a source of ethical concern, Amazon reported that its new technologies were being released with built-in safety controls and that all creative content generation models would include watermarking.