{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"align-center\">\n",
    "<a href=\"https://oumi.ai/\"><img src=\"https://oumi.ai/docs/en/latest/_static/logo/header_logo.png\" height=\"200\"></a>\n",
    "\n",
    "[![Documentation](https://img.shields.io/badge/Documentation-latest-blue.svg)](https://oumi.ai/docs/en/latest/index.html)\n",
    "[![Discord](https://img.shields.io/discord/1286348126797430814?label=Discord)](https://discord.gg/oumi)\n",
    "[![GitHub Repo stars](https://img.shields.io/github/stars/oumi-ai/oumi)](https://github.com/oumi-ai/oumi)\n",
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/oumi-ai/oumi/blob/main/notebooks/Oumi - Finetuning Tutorial.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "</div>\n",
    "\n",
    "👋 Welcome to Open Universal Machine Intelligence (Oumi)!\n",
    "\n",
    "🚀 Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from [data preparation](https://oumi.ai/docs/en/latest/resources/datasets/datasets.html) and [training](https://oumi.ai/docs/en/latest/user_guides/train/train.html) to [evaluation](https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html) and [deployment](https://oumi.ai/docs/en/latest/user_guides/launch/launch.html). Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.\n",
    "\n",
    "🤝 Make sure to join our [Discord community](https://discord.gg/oumi) to get help, share your experiences, and contribute to the project! If you are interested in joining one of the community's open-science efforts, check out our [open collaboration](https://oumi.ai/community) page.\n",
    "\n",
    "⭐ If you like Oumi and you would like to support it, please give it a star on [GitHub](https://github.com/oumi-ai/oumi)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Finetuning Overview\n",
    "\n",
    "In this tutorial, we'll LoRA tune a large language model to produce \"thoughts\" before producing its output.\n",
    "\n",
    "We'll use the Oumi framework to streamline the process and achieve high-quality results.\n",
    "\n",
    "We'll cover the following topics:\n",
    "1. Prerequisites\n",
    "2. Data Preparation & Sanity Checks\n",
    "3. Training Config Preparation\n",
    "4. Launching Training\n",
    "5. Monitoring Progress\n",
    "6. Evaluation\n",
    "7. Analyzing Results\n",
    "8. Inference\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "❗**NOTICE:** We recommend running this notebook on a GPU. If running on Google Colab, you can use the free T4 GPU runtime (Colab Menu: `Runtime` -> `Change runtime type`). On Colab, we recommend replacing `HuggingFaceTB/SmolLM2-1.7B-Instruct` with a smaller model like `HuggingFaceTB/SmolLM2-135M-Instruct`, since the T4 only has 16GB VRAM; you can use `Edit -> Find and replace` in the menu bar to do so.\n",
    "\n",
    "First, let's install Oumi. You can find more detailed instructions [here](https://oumi.ai/docs/en/latest/get_started/installation.html). Here, we include Oumi's GPU dependencies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install oumi[gpu]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating our working directory\n",
    "For our experiments, we'll use the following folder to save the model, training artifacts, and our working configs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "tutorial_dir = \"finetuning_tutorial\"\n",
    "\n",
    "Path(tutorial_dir).mkdir(parents=True, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup the environment\n",
    "\n",
    "You may need to set the following environment variables:\n",
    "- [Optional] HF_TOKEN: Your [HuggingFace](https://huggingface.co/docs/hub/en/security-tokens) token, in case you want to access a private model like Llama.\n",
    "- [Optional] WANDB_API_KEY: Your [wandb](https://wandb.ai) token, in case you want to log your experiments to wandb."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting Started\n",
    "\n",
    "\n",
    "## Data Preparation\n",
    "Let's start by checking out our datasets, and seeing what the data looks like. The OpenO1-SFT dataset includes a variety of tasks, including code generation and explanation, with most examples having a \"thought\" produced prior to the output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from oumi.builders import build_tokenizer\n",
    "from oumi.core.configs import ModelParams\n",
    "from oumi.datasets import PromptResponseDataset\n",
    "\n",
    "# Initialize the dataset\n",
    "tokenizer = build_tokenizer(\n",
    "    ModelParams(model_name=\"HuggingFaceTB/SmolLM2-1.7B-Instruct\")\n",
    ")\n",
    "dataset = PromptResponseDataset(\n",
    "    tokenizer=tokenizer,\n",
    "    hf_dataset_path=\"O1-OPEN/OpenO1-SFT\",\n",
    "    prompt_column=\"instruction\",\n",
    "    response_column=\"output\",\n",
    ")\n",
    "\n",
    "# Print a few examples\n",
    "for i in range(3):\n",
    "    conversation = dataset.conversation(i)\n",
    "    print(f\"Example {i + 1}:\")\n",
    "    for message in conversation.messages:\n",
    "        print(f\"{message.role}: {message.content[:100]}...\")  # Truncate for brevity\n",
    "    print(\"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Preparation\n",
    "\n",
    "For code generation, we want a model with strong general language understanding and coding capabilities. \n",
    "\n",
    "We also want a model that is small enough to train and run on a single GPU.\n",
    "\n",
    "Some good options include:\n",
    "- [\"microsoft/Phi-3-mini-128k-instruct\"](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)\n",
    "- [\"google/gemma-2b\"](https://huggingface.co/google/gemma-2b)\n",
    "- [\"Qwen/Qwen2-1.5B-Instruct\"](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct)\n",
    "- [\"meta-llama/Llama-3.2-3B-Instruct\"](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)\n",
    "- [\"HuggingFaceTB/SmolLM2-1.7B-Instruct\"](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)\n",
    "\n",
    "\n",
    "For this tutorial, we'll use \"HuggingFaceTB/SmolLM2-1.7B-Instruct\" as our base model.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initial Model Responses\n",
    "\n",
    "Let's see how our model performs on an example prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile $tutorial_dir/infer.yaml\n",
    "\n",
    "model:\n",
    "  model_name: \"HuggingFaceTB/SmolLM2-1.7B-Instruct\"\n",
    "  trust_remote_code: true\n",
    "  torch_dtype_str: \"bfloat16\"\n",
    "\n",
    "generation:\n",
    "  max_new_tokens: 128\n",
    "  batch_size: 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from oumi.core.configs import InferenceConfig\n",
    "from oumi.infer import infer\n",
    "\n",
    "config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / \"infer.yaml\"))\n",
    "\n",
    "input_text = (\n",
    "    \"Write a Python function to implement the quicksort algorithm. \"\n",
    "    \"Please include comments explaining each step.\"\n",
    ")\n",
    "\n",
    "results = infer(config=config, inputs=[input_text])\n",
    "\n",
    "print(results[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparing our training experiment\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a YAML file for our training config:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile $tutorial_dir/train.yaml\n",
    "\n",
    "model:\n",
    "  model_name: \"HuggingFaceTB/SmolLM2-1.7B-Instruct\"\n",
    "  trust_remote_code: true\n",
    "  torch_dtype_str: \"bfloat16\"\n",
    "  tokenizer_pad_token: \"<|endoftext|>\"\n",
    "  device_map: \"auto\"\n",
    "\n",
    "data:\n",
    "  train:\n",
    "    datasets:\n",
    "      - dataset_name: \"PromptResponseDataset\"\n",
    "        split: \"train\"\n",
    "        sample_count: 8000\n",
    "        dataset_kwargs: {\n",
    "          \"hf_dataset_path\": \"O1-OPEN/OpenO1-SFT\",\n",
    "          \"prompt_column\": \"instruction\",\n",
    "          \"response_column\": \"output\",\n",
    "          \"assistant_only\": true,\n",
    "          \"instruction_template\": \"<|im_start|>user\\n\",\n",
    "          \"response_template\": \"<|im_start|>assistant\\n\",\n",
    "        }\n",
    "        shuffle: True\n",
    "        seed: 42\n",
    "    seed: 42\n",
    "\n",
    "training:\n",
    "  output_dir: \"finetuning_tutorial/output\"\n",
    "\n",
    "  # For a single GPU, the following gives us a batch size of 16\n",
    "  # If training with multiple GPUs, feel free to reduce gradient_accumulation_steps\n",
    "  per_device_train_batch_size: 2\n",
    "  gradient_accumulation_steps: 8\n",
    "  \n",
    "  # ***NOTE***\n",
    "  # We set it to 10 steps to first verify that it works\n",
    "  # Swap to 1500 steps to get more meaningful results.\n",
    "  # Note: 1500 steps will take 2-3 hours on a single A100-40GB GPU.\n",
    "  max_steps: 10\n",
    "  # max_steps: 1500\n",
    "\n",
    "  learning_rate: 1e-3\n",
    "  warmup_ratio: 0.1\n",
    "  logging_steps: 10\n",
    "  save_steps: 0\n",
    "  max_grad_norm: 1\n",
    "  weight_decay: 0.01\n",
    "\n",
    "  \n",
    "  trainer_type: \"TRL_SFT\"\n",
    "  optimizer: \"adamw_torch_fused\"\n",
    "  enable_gradient_checkpointing: True\n",
    "  gradient_checkpointing_kwargs:\n",
    "    use_reentrant: False\n",
    "  ddp_find_unused_parameters: False\n",
    "  dataloader_num_workers: \"auto\"\n",
    "  dataloader_prefetch_factor: 32\n",
    "  empty_device_cache_steps: 1\n",
    "  use_peft: true\n",
    "\n",
    "peft:\n",
    "  lora_r: 16\n",
    "  lora_alpha: 32\n",
    "  lora_target_modules:\n",
    "    - \"q_proj\"\n",
    "    - \"k_proj\"\n",
    "    - \"v_proj\"\n",
    "    - \"o_proj\"\n",
    "    - \"gate_proj\"\n",
    "    - \"up_proj\"\n",
    "    - \"down_proj\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fine-tuning the model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "source": [
    "This will start the fine-tuning process using the Oumi framework. Because we set `max_steps: 5`, this should be very quick. The full fine-tuning process may take a few hours, depending on your GPU."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SINGLE GPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "!oumi train -c \"$tutorial_dir/train.yaml\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### MULTI-GPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "!oumi distributed torchrun -m oumi train -c \"$tutorial_dir/train.yaml\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "\n",
    "As an example, let's create an evaluation configuration file!\n",
    "\n",
    "**Note:** Since we've finetuned our model to produce thoughts before answering, it's very likely to do worse on most evals out-of-the-box.\n",
    "\n",
    "Many evals do not allow models to decode and thus don't take advantage of things like inference-time reasoning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile $tutorial_dir/eval.yaml\n",
    "\n",
    "model:\n",
    "  model_name: \"finetuning_tutorial/output\"\n",
    "  torch_dtype_str: \"bfloat16\"\n",
    "\n",
    "tasks:\n",
    "  - evaluation_backend: lm_harness\n",
    "    task_name: mmlu_college_computer_science\n",
    "\n",
    "output_dir: \"finetuning_tutorial/output/evaluation\"\n",
    "generation:\n",
    "  batch_size: null # This will let LM HARNESS find the maximum possible batch size."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "!oumi evaluate -c \"$tutorial_dir/eval.yaml\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "## Use the Fine-tuned Model\n",
    "\n",
    "Once we're happy with the results, we can serve the fine-tuned model for interactive inference:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile $tutorial_dir/trained_infer.yaml\n",
    "\n",
    "model:\n",
    "  model_name: \"HuggingFaceTB/SmolLM2-1.7B-Instruct\"\n",
    "  adapter_model: \"finetuning_tutorial/output\"\n",
    "  trust_remote_code: true\n",
    "  torch_dtype_str: \"bfloat16\"\n",
    "\n",
    "generation:\n",
    "  max_new_tokens: 2048\n",
    "  batch_size: 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from oumi.core.configs import InferenceConfig\n",
    "from oumi.infer import infer\n",
    "\n",
    "config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / \"trained_infer.yaml\"))\n",
    "\n",
    "input_text = (\n",
    "    \"Write a Python function to implement the quicksort algorithm. \"\n",
    "    \"Please include comments explaining each step.\"\n",
    ")\n",
    "\n",
    "results = infer(config=config, inputs=[input_text])\n",
    "\n",
    "print(results[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 🧭 What's Next?\n",
    "\n",
    "Congrats on finishing this notebook! Feel free to check out our other [notebooks](https://github.com/oumi-ai/oumi/tree/main/notebooks) in the [Oumi GitHub](https://github.com/oumi-ai/oumi), and give us a star! You can also join the Oumi community over on [Discord](https://discord.gg/oumi).\n",
    "\n",
    "📰 Want to keep up with news from Oumi? Subscribe to our [Substack](https://blog.oumi.ai/) and [Youtube](https://www.youtube.com/@Oumi_AI)!\n",
    "\n",
    "⚡ Interested in building custom AI in hours, not months? Apply to get [early access](https://oumi.ai/contact?utm_source=oumi_oss_tutorial_finetuning) to the Oumi Platform, or [chat with us](https://oumi.ai/book?utm_source=oumi_oss_tutorial_finetuning) to learn more!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "oumi",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}