GitHub - zckly/ai-engineer-roadmap: The most comprehensive free guide for becoming an AI Engineer in 2024

4 min read Original article ↗
ai engineer roadmap

The fastest, most comprehensive way to become an AI Engineer in 2024

Welcome to the AI Engineer Roadmap! This guide offers a project-based approach to mastering AI engineering, whether you're a beginner or looking to expand your skills. Each section includes practical projects to apply your knowledge, build real-world AI applications, and develop crucial problem-solving skills ᕙ( •̀ ᗜ •́ )ᕗ

Table of Contents

  1. Web/App Development
  2. Beginner Text Generation
  3. Advanced Text Generation
  4. Image Generation
  5. Speech
  6. Computer Vision

Web/App Development

application development

It helps to have the ability to code your own interfaces, but it's also 100% possible to build AI products without knowing how to program. It's up to you if you wanna go down the coding (full-stack) route or no-code (Webflow, Zapier, etc) route.

Full-stack Route (recommended)

  • Front-end: Learn React for building interactive user interfaces
  • Back-end: Master NodeJS/NextJS for server-side development
  • Database: Understand and implement Postgres for data storage

There are tons of roadmaps out there for learning web development. One of my favorites is Scrimba. I also have a bootcamp on Youtube that covers full-stack web dev + building AI apps

No-code Route

  • Website Builder: Explore Webflow for creating professional websites without coding
  • Workflow Builder: Use Zapier to automate processes and integrate applications
  • Database: Leverage Firebase or Airtable for easy-to-use, scalable data storage solutions

Beginner Text Generation

beginner text generation
  1. Understanding Large Language Models (LLMs)

  2. Proprietary LLMs

    • OpenAI's GPT models
    • Anthropic's Claude 3 family
    • Google's Gemini
  3. Open-source LLMs

    • Meta's LLaMA 3
    • Cohere's Command-R
  4. Prompt Engineering

  5. Basic Chatbots

  6. Handling Structured Output

    • Learn techniques for generating and parsing structured data from LLMs
    • Check out Instructor or use string parsing

Advanced Text Generation

advanced text generation
  1. Function Calling and Tool Usage

    • Implement LLM-powered tools and integrate external functions
    • Project: Build a personal assistant that can interact with your calendar, email, and task list
  2. Web-browsing Capabilities

    • Learn about techniques for scraping and summarizing web content
    • Project: Build an open-source version of Perplexity (like morph.so)
  3. Fine-tuning LLMs

    • Techniques for adapting pre-trained models to specific tasks
    • Project: Fine-tune a model on a specific domain (e.g., medical terminology, legal jargon)
  4. Embeddings and Vector Databases

    • Understand and implement vector representations of text
    • Explore vector database solutions for efficient similarity search (e.g. Chroma, Supabase, Weaviate)
    • Project: Build a semantic search engine for a large corpus of documents
  5. Retrieval Augmented Generation (RAG)

    • Learn about different RAG architectures and when to use them
    • Project: Develop a "Chat with PDF" application
  6. AI Agents

    • Study projects like OpenDevin to understand autonomous AI systems
    • Project: Autonomous research agent

Speech

speech
  1. Text-to-Speech (TTS)

    • Implement TTS using services like ElevenLabs and OpenAI
    • Project: Create an audiobook generator from text input
  2. Speech-to-Text (STT)

    • Utilize models like OpenAI's Whisper for transcription
    • Project: Create a job interview coach application
  3. Speech Analysis

    • Explore emotion and intent analysis using tools like Hume AI or Google Gemini 1.5 Pro
    • Project: Create an AI Therapist with emotion detection
    • Learn about prosody analysis and its applications in understanding speaker intent

Image Generation

CleanShot July 2

image generation
  1. Prompt Engineering for Image Generation

    • Read up on art history and photography terminology to craft effective prompts
    • Join the Midjourney Discord to study how experts prompt image models
    • Project: Create a series of images that tell a story, using consistent style and characters
  2. Proprietary Image Generation Models

    • Explore capabilities of models like GPT-4o, Claude, and Gemini
    • Project: Children's coloring/story book generator
    • Learn about image-to-image transformations (style transfer, inpainting, outpainting)
  3. Open-source Image Generation Models

    • Experiment with Stable Diffusion and other accessible models
    • Project: Build a custom image generation UI with fine-grained controls

Computer Vision

computer vision
  1. Image Analysis

    • Leverage models like Claude or GPT-4o for comprehensive image understanding
    • Project: Develop an app that can analyze and describe the contents of photos
    • Learn about object detection, segmentation, and classification techniques
  2. Video Analysis

    • Explore advanced capabilities with models like Google Gemini 1.5 Pro
    • Project: Video narration
    • Study techniques for tracking objects and analyzing motion in videos
    • Project: Create a sports analysis tool that can break down player movements and tactics

Happy learning and building!

  • Zack

my twitter