Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli


Large Language Models, Dispense di Tecniche Di Intelligenza Artificiale

Documento scritto da me in inglese sui Large Language Models. Tutorial facili e spiegati per conoscere gli LLM, OpenAI, LLaMa, Claude. Richiamare da codice i modelli, costruire RAG e addestrare LLM. Il documento è adatto per ogni università che svolga esami su LLM o NLP.

Tipologia: Dispense

2024/2025

In vendita dal 28/02/2025

giulio_russo
giulio_russo 🇮🇹

4.8

(42)

111 documenti

1 / 59

Toggle sidebar

Questa pagina non è visibile nell’anteprima

Non perderti parti importanti!

bg1
🦙
Large
💬
Language
📊
Models
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b

Anteprima parziale del testo

Scarica Large Language Models e più Dispense in PDF di Tecniche Di Intelligenza Artificiale solo su Docsity!

Large

Language

Models

Index

  • Large Language Models
    • Ollama
    • OpenAI
    • Claude
    • Gemini
    • User Interface
  • AI Agents
    • Tools
    • Multi-modality
    • HuggingFace
    • Tokenizers
    • Quantization
    • Comparison
  • Retrival Augmented Generation
    • Vector embedding
    • LangChain
    • Create embeddings with Chroma
    • Build a RAG pipeline
  • Training
    • Fine-tuning an OpenAI LLM
    • Fine-tuning an HuggingFace LLM
    • LoRA
    • QLoRA
  • Agentic AI
  • Thanks

Panorama of LLM models

LLMs are advanced AI systems trained on massive datasets to understand and generate human-like text. They excel in tasks such as:

  • Synthesizing Information: LLMs can combine insights from multiple sources to provide concise summaries, explanations, or comparisons. Example: Summarizing complex topics or extracting key points from large documents.
  • Fleshing Out Skeletons: They help expand outlines or ideas into detailed content. Example: Developing blog posts, reports, or structured essays from an initial framework.
  • Coding: LLMs assist in writing and debugging code, generating boilerplates, or even explaining algorithms. Example: Writing Python scripts, SQL queries, or REST API integration snippets. Limitations:
  • Specialized Domains: LLMs may struggle with niche topics where training data is sparse or highly technical (e.g., quantum physics, niche medical diagnoses).
  • Recent Events: Models trained on static datasets may lack knowledge of very recent developments unless explicitly updated.
  • Mistakes: LLMs can confidently provide incorrect or misleading information (e.g. fabricating facts or errors in reasoning). Always verify outputs. Main Models on the Market:
  1. GPT (OpenAI): Renowned for its advanced capabilities in natural language understanding and generation (e.g. GPT-4, GPT-3.5).
  2. Claude (Anthropic): Focused on safety and alignment, designed for human-centric applications.
  3. Gemini (Google DeepMind): Combines text and multimodal capabilities with Google's vast infrastructure and search knowledge.
  4. Llama (Meta): Open-access LLMs tailored for researchers and developers (e.g., Llama 2).
  5. Perplexity : Specialized in search and conversational tasks, offering succinct and fact-based outputs.

LLMs under the hood

LLMs are powerful tools built on advancements in natural language processing (NLP) and deep learning, particularly the Transformer architecture introduced in the groundbreaking paper "Attention Is All You Need" (Vaswani et al. 2017). The Evolution of LLMs: LLMs have evolved significantly since the introduction of the Transformer architecture. Attention Is All You Need (2017) introduced the Transformer architecture, which revolutionized NLP by replacing recurrent neural networks (RNNs) and long short-term memory (LSTM) models with a structure based on self-attention mechanisms.

  • Self-Attention: Allows the model to focus on relevant parts of an input sequence regardless of its length.
  • Parallelization: Enabled faster training compared to sequential RNNs.
  • Became the foundation for subsequent LLMs. LLMs workflow:
  • Tokenization: Text is converted into smaller units called tokens (e.g. words, subwords, or characters). Example: Input: "The quick brown fox" Tokens: ["The", "quick", "brown", "fox"] OpenAI provide a good tool to see how tokenization is done: https://platform.openai.com/tokenizer

LLM evolution

Prompt Engineering: Guide LLM behavior via input instructions. Effective task- specific LLM outputs. Custom GPTs: Adapt GPTs for domain-specific applications. Models tailored to fields like law, medicine. Copilots Assist: users in workflows with intelligent tools. Workflow-specific assistance, e.g. coding or writing. Agentization: Create autonomous systems for complex tasks. Agents capable of planning, reasoning, and executing actions. GPT Models: OpenAI built upon the Transformer to create the Generative Pre- trained Transformer (GPT) series:

  1. GPT-1 (2018): Introduced pretraining on large text corpora followed by fine- tuning for specific tasks. Showcased the power of unsupervised pretraining for transfer learning.
  2. GPT-2 (2019): Scaled up the model size (1.5 billion parameters). Demonstrated impressive zero-shot learning abilities: generating coherent text from a prompt without task-specific fine-tuning. Initially not released fully due to concerns about misuse.
  3. GPT-3 (2020): Drastically increased parameters (175 billion). Exhibited few- shot and zero-shot learning capabilities, enabling task performance with minimal examples. Highlighted the potential for prompt engineering to guide the model's output without retraining.
  4. RLHF in ChatGPT (2022): Reinforcement Learning with Human Feedback (RLHF): Refined GPT-3.5 and GPT-4 to align model behavior with user expectations. Used human evaluators to fine-tune the model for generating safer, more aligned, and user-friendly responses.

Before going on

When working with Data Science models, you could be carrying out 2 very different activities:

  1. Training: when you provide a model with data for it to adapt to get better at a task in the future. It does this by updating its internal settings - the parameters or weights of the model. If you're Training a model that's already had some training, the activity is called "fine-tuning".
  2. Inference: when you are working with a model that has already been trained. You are using that model to produce new outputs on new inputs, taking advantage of everything it learned while it was being trained. Inference is also sometimes referred to as "Execution" or "Running a model". Let's see some models and how to call them in inference!

Ollama

Ollama is a framework that offers a lot of open source Large Language Models in an easy way. Download the model from: https://ollama.com/download and unzip the extracted file. Then, install the Ollama command line tools as suggested and run the LLM with: ollama run <MODEL_NAME> For example: Note that the first time a new model is runned, it takes time to download all the necessary parameters: For example, here I test the Mistral model for the first time:

  1. Generate Your API Input: Ollama expects structured messages in a JSON format: messages = [ {"role": "system", "content": "system message goes here"}, {"role": "user", "content": "user message goes here"} ]
  2. Authenticate and Call the API: Use the Python ollama library to call the local API. Authentication isn’t necessary since it runs locally. Here’s how to structure the call. For example: import ollama response = ollama.create( model="llama3.2", messages=messages ) The response content can be seen: print(response["response"])

OpenAI

The OpenAI models are easily accessible via the official site: https://chatgpt.com This guide outlines the core steps required to call OpenAI's API to build intelligent conversational systems or other generative AI applications.

  1. Generate an API key: Create an OpenAI account if you don't have one by visiting: https://platform.openai.com/ and follow the instructions to create an API key. Once its showed, save it immediately because you will no longer see it. No one, except you have to use this API key. OpenAI asks for a minimum credit to use the API. The API calls will spend against this $5. You can add your credit balance to OpenAI at Settings > Billing: https://platform.openai.com/settings/organization/billing/overview Note: disable the automatic recharge!
  2. Authentication: The API key authenticates your application with OpenAI's servers. Without it, you cannot access the service. Generate an API key from the OpenAI dashboard and store the key securely inside an .env file:

Claude

The Anthropic Claude models are easily accessible via their official site: https://claude.ai This guide outlines the core steps required to call Anthropic's API to build intelligent conversational systems or other generative AI applications.

  1. Generate an API Key: To begin, create an account on the Anthropic platform if you don’t already have one: https://console.anthropic.com/ Once registered, follow the instructions to create an API key. Save the key immediately after it's displayed, as you won't be able to view it again. Keep it private and secure—only you should use this API key. Anthropic requires an active billing account to use the API. You can manage your billing information under Account Settings > Billing. Be aware of any usage limits or costs associated with the API.
  1. Authentication: Your API key serves as a credential to authenticate requests to Anthropic's servers. Without it, you cannot access the service. Store your API key securely in an .env file for easy and safe access in your application: ANTHROPIC_API_KEY=your-api-key-here Use the dotenv library in Python to load the key programmatically: from dotenv import load_dotenv import os load_dotenv() api_key = os.getenv("ANTHROPIC_API_KEY")
  2. Build the API Input: Anthropic’s API expects input messages to be structured in a conversational format, similar to other APIs. For example: system_message = "This is a system-level instruction."}, user_prompt = [ {"role": "user", "content": "user message goes here."} ]
  3. Call the API: To interact with Claude, use the anthropic library (or HTTP requests if no SDK is available). Below is an example of querying the model: import anthropic client = anthropic.Client(api_key) result = claude.messages.create( model="claude-3-5-sonnet-20240620", system=system_message, messages=user_prompt ) The response content can be seen: print(message.content[0].text) Alternatively if we want to reproduce the typewriter animation of the response generation in our code, we can call: result = claude.messages.stream( ... ) and see the response as: with result as stream: for text in stream.text_stream: print(text, end="", flush=True)
  1. Build the API Input: Gemini’s API expects input messages to be structured in a conversational format. For example: system_message = "This is a system-level instruction." user_prompt = [ {"role": "user", "content": "user message goes here."} ]
  2. Call the API: To interact with Gemini, use the google.generativeai library. Below is an example of querying the model: import google.generativeai as genai gemini = genai.GenerativeModel( model_name='gemini-1.5-flash', system_instruction=system_message ) response = gemini.generate_content(user_prompt) The response can be seen: print(response.text) The typewriter effect can be obtained as: response = gemini.generate_stream(user_prompt) for chunk in response.text_stream: print(chunk, end="", flush=True)

User Interface

User interfaces (UIs) are essential for making AI models accessible to users, and frameworks like Gradio simplify the process of building interactive applications. Gradio is a Python library that allows developers to quickly create and deploy web- based UIs. import gradio as gr

  • Basic Chat Interface Let's consider a function that handle the call to an LLM, for example to GPT: def stream_gpt(prompt): messages = [ {"role": "system", "content": system_message}, {"role": "user", "content": prompt} ] stream = openai.ChatCompletion.create( model='gpt-4o-mini', messages=messages, stream=True ) result = "" for chunk in stream: result += chunk.choices[0].delta.content or "" yield result This function build the message in the OpenAI API format. The user prompt is passed as function argument, while a system prompt is given. The GPT model is called and the result is given with a typewriter effect. We can build an interface as follow: view = gr.Interface( fn=stream_gpt, inputs=[gr.Textbox(label="Your message:")], outputs=[gr.Markdown(label="Response:")], allow_flagging="never" ) view.launch() Components:
  • Function (fn): Specifies the backend function (stream_gpt) that processes user input and generates a response.
  • Inputs: A Textbox is used to accept the user's message. The label "Your message:" describes the purpose of the input field. The content of the text box will be the input of the function component (in this case the user prompt).
  • Outputs: A Markdown box displays the model’s response. Markdown allows formatted text, such as bold, italics, and links.
  • allow_flagging: This disables the built-in Gradio flagging keeping the UI clean. Launch:
  • view.launch() starts a local server where the interface can be accessed through a web browser.

Let's build an interface with a drop-down menu as additional input: view = gr.Interface( fn=stream_model, inputs=[ gr.Textbox(label="Your message:"), gr.Dropdown(["GPT", "Claude"], label="Select model", value="GPT") ], outputs=[gr.Markdown(label="Response:")], flagging_mode="never" ) view.launch() A Dropdown component is added, allowing the user to select between different models (e.g., GPT or Claude). The default value is set to "GPT."

AI Agents

LLM chatbots are remarkably efficient in conversations. For this reason they are perfectly suitable to build an AI assistant. They have to be:

  • Friendly
  • Mantain context during the conversation
  • Subject expertise It's fundamental to well define:
  • System prompt
  • Context inside our user prompt
  • Multi-shot prompting (if the past tokens are obtained from a conversation related to the topic, it's more probable that the future tokens will get the answer to our question)

Keep contest

One of the key feature to build an AI Assistant is to keep the context of the chat. The LLM has to know which exchange of questions and answers has been done during the chat in order to "remember" what has happened. With Gradio's ChatInterface, this process becomes streamlined, especially since recent updates allow Gradio to pass the conversation history in the OpenAI format directly, eliminating additional processing. In conversational AI, context is stored as a series of messages that represent the interaction between the user and the assistant. OpenAI expects this context in the following format: [ {"role": "system", "content": "system message here"}, {"role": "user", "content": "first user prompt here"}, {"role": "assistant", "content": "the assistant's response"}, {"role": "user", "content": "the new user prompt"}, ] The roles define the participant in the conversation:

  • System: Provides instructions or sets the model’s behavior.
  • User: Represents the user’s input.
  • Assistant: Contains the AI’s responses. To ensure that responses are context-aware, we:
  1. Combine the system message with the conversation history.
  2. Add the latest user message before sending the request to OpenAI. The history process in Gradio is taken very easily: def chat(message, history): # Combine system message, history, and latest user message messages = [ {"role": "system", "content": system_message} ] + history + [ {"role": "user", "content": message} ]