Prepara i tuoi esami
Ottieni punti
Guide e consigli
Vendi su Docsity
Docsity AI

Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity

Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium

Guide e consigli

Vendi su Docsity

Docsity AI

Accedi Registrati

Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity

Cerca documenti

Prepara i tuoi esami con i documenti condivisi da studenti come te su Docsity

Cerca la tua università

Trova i documenti specifici per gli esami della tua università

Video Corsi

Preparati con lezioni e prove svolte basate sui programmi universitari!

Quiz

Rispondi a reali domande d’esame e scopri la tua preparazione

Docsity AINEW

Riassumi i tuoi documenti, fagli domande, convertili in quiz e mappe concettuali

Maturità 2026

Studia con prove svolte, tesine e consigli utili

Esplora domande

Togliti ogni dubbio leggendo le risposte alle domande fatte da altri studenti come te

Argomenti di studio

Esplora i documenti più scaricati per gli argomenti di studio più popolari

Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium

Condividi documenti

20 Punti

Per ogni documento caricato

Rispondi alle domande

5 Punti

per ogni risposta data (max 1 al giorno)

Tutti i modi per ottenere punti gratis

Ottieni punti subito

Scegli un piano Premium con tutti i punti di cui hai bisogno

Opportunità di studio

Scegli il tuo prossimo programma di studio

Entra in contatto con le migliori università del mondo e scegli il tuo percorso di studi

Classifica delle migliori università

Scopri le migliori università italiane secondo gli studenti

Community

Chiedi alla community

Chiedi aiuto alla community e sciogli i tuoi dubbi legati allo studio

Guide Gratuite

I nostri eBook salva studente

Scarica gratuitamente le nostre guide sulle tecniche di studio, metodi per gestire l'ansia, dritte per la tesi realizzati da tutor Docsity

Large Language Models, Dispense di Tecniche Di Intelligenza Artificiale

Università degli Studi di Cassino e del Lazio Meridionale (UNICAS)Tecniche Di Intelligenza Artificiale

Documento scritto da me in inglese sui Large Language Models. Tutorial facili e spiegati per conoscere gli LLM, OpenAI, LLaMa, Claude. Richiamare da codice i modelli, costruire RAG e addestrare LLM. Il documento è adatto per ogni università che svolga esami su LLM o NLP.

Tipologia: Dispense

2024/2025

In vendita dal 28/02/2025

giulio_russo 🇮🇹

4.8

(42)

111 documenti

1 / 59

Questa pagina non è visibile nell’anteprima

Non perderti parti importanti!

🦙

Large

💬

Language

📊

Models

Scopri Dispense di Tecniche Di Intelligenza Artificiale Università degli Studi di Cassino e del Lazio Meridionale (UNICAS)

Documenti correlati

SSH and Python connection

NIfTI Guide (MRI e CTA)

Stanford Natural Language Inference (SNLI) Corpus: A Large Annotated Dataset

Installazione e configurazione di Python: guida completa per principianti

Guida a un Progetto di Machine Learning: Preprocessing, Modellazione e Valutazione

Guida alla struttura e documentazione di progetti Python

Machine and Deep Learning

Lezione n 5 tecniche DELL'ARGOMENTAZIONE

Lezione n 4 tecniche DELL'ARGOMENTAZIONE

Lezione n 3 tecniche dell'argomentazione

Lezione 2 tecniche DELL'ARGOMENTAZIONE

Lezione n 6 tecniche DELL'ARGOMENTAZIONE

Anteprima parziale del testo

Scarica Large Language Models e più Dispense in PDF di Tecniche Di Intelligenza Artificiale solo su Docsity!

Large

Language

Models

Index

Large Language Models
- Ollama
- OpenAI
- Claude
- Gemini
- User Interface
AI Agents
- Tools
- Multi-modality
- HuggingFace
- Tokenizers
- Quantization
- Comparison
Retrival Augmented Generation
- Vector embedding
- LangChain
- Create embeddings with Chroma
- Build a RAG pipeline
Training
- Fine-tuning an OpenAI LLM
- Fine-tuning an HuggingFace LLM
- LoRA
- QLoRA
Agentic AI
Thanks

Panorama of LLM models

LLMs are advanced AI systems trained on massive datasets to understand and generate human-like text. They excel in tasks such as:

Synthesizing Information: LLMs can combine insights from multiple sources to provide concise summaries, explanations, or comparisons. Example: Summarizing complex topics or extracting key points from large documents.
Fleshing Out Skeletons: They help expand outlines or ideas into detailed content. Example: Developing blog posts, reports, or structured essays from an initial framework.
Coding: LLMs assist in writing and debugging code, generating boilerplates, or even explaining algorithms. Example: Writing Python scripts, SQL queries, or REST API integration snippets. Limitations:
Specialized Domains: LLMs may struggle with niche topics where training data is sparse or highly technical (e.g., quantum physics, niche medical diagnoses).
Recent Events: Models trained on static datasets may lack knowledge of very recent developments unless explicitly updated.
Mistakes: LLMs can confidently provide incorrect or misleading information (e.g. fabricating facts or errors in reasoning). Always verify outputs. Main Models on the Market:

GPT (OpenAI): Renowned for its advanced capabilities in natural language understanding and generation (e.g. GPT-4, GPT-3.5).
Claude (Anthropic): Focused on safety and alignment, designed for human-centric applications.
Gemini (Google DeepMind): Combines text and multimodal capabilities with Google's vast infrastructure and search knowledge.
Llama (Meta): Open-access LLMs tailored for researchers and developers (e.g., Llama 2).
Perplexity : Specialized in search and conversational tasks, offering succinct and fact-based outputs.

LLMs under the hood

LLMs are powerful tools built on advancements in natural language processing (NLP) and deep learning, particularly the Transformer architecture introduced in the groundbreaking paper "Attention Is All You Need" (Vaswani et al. 2017). The Evolution of LLMs: LLMs have evolved significantly since the introduction of the Transformer architecture. Attention Is All You Need (2017) introduced the Transformer architecture, which revolutionized NLP by replacing recurrent neural networks (RNNs) and long short-term memory (LSTM) models with a structure based on self-attention mechanisms.

Self-Attention: Allows the model to focus on relevant parts of an input sequence regardless of its length.
Parallelization: Enabled faster training compared to sequential RNNs.
Became the foundation for subsequent LLMs. LLMs workflow:
Tokenization: Text is converted into smaller units called tokens (e.g. words, subwords, or characters). Example: Input: "The quick brown fox" Tokens: ["The", "quick", "brown", "fox"] OpenAI provide a good tool to see how tokenization is done: https://platform.openai.com/tokenizer

LLM evolution

Prompt Engineering: Guide LLM behavior via input instructions. Effective task- specific LLM outputs. Custom GPTs: Adapt GPTs for domain-specific applications. Models tailored to fields like law, medicine. Copilots Assist: users in workflows with intelligent tools. Workflow-specific assistance, e.g. coding or writing. Agentization: Create autonomous systems for complex tasks. Agents capable of planning, reasoning, and executing actions. GPT Models: OpenAI built upon the Transformer to create the Generative Pre- trained Transformer (GPT) series:

GPT-1 (2018): Introduced pretraining on large text corpora followed by fine- tuning for specific tasks. Showcased the power of unsupervised pretraining for transfer learning.
GPT-2 (2019): Scaled up the model size (1.5 billion parameters). Demonstrated impressive zero-shot learning abilities: generating coherent text from a prompt without task-specific fine-tuning. Initially not released fully due to concerns about misuse.
GPT-3 (2020): Drastically increased parameters (175 billion). Exhibited few- shot and zero-shot learning capabilities, enabling task performance with minimal examples. Highlighted the potential for prompt engineering to guide the model's output without retraining.
RLHF in ChatGPT (2022): Reinforcement Learning with Human Feedback (RLHF): Refined GPT-3.5 and GPT-4 to align model behavior with user expectations. Used human evaluators to fine-tune the model for generating safer, more aligned, and user-friendly responses.

Before going on

When working with Data Science models, you could be carrying out 2 very different activities:

Training: when you provide a model with data for it to adapt to get better at a task in the future. It does this by updating its internal settings - the parameters or weights of the model. If you're Training a model that's already had some training, the activity is called "fine-tuning".
Inference: when you are working with a model that has already been trained. You are using that model to produce new outputs on new inputs, taking advantage of everything it learned while it was being trained. Inference is also sometimes referred to as "Execution" or "Running a model". Let's see some models and how to call them in inference!

Ollama

Ollama is a framework that offers a lot of open source Large Language Models in an easy way. Download the model from: https://ollama.com/download and unzip the extracted file. Then, install the Ollama command line tools as suggested and run the LLM with: ollama run <MODEL_NAME> For example: Note that the first time a new model is runned, it takes time to download all the necessary parameters: For example, here I test the Mistral model for the first time:

Generate Your API Input: Ollama expects structured messages in a JSON format: messages = [ {"role": "system", "content": "system message goes here"}, {"role": "user", "content": "user message goes here"} ]
Authenticate and Call the API: Use the Python ollama library to call the local API. Authentication isn’t necessary since it runs locally. Here’s how to structure the call. For example: import ollama response = ollama.create( model="llama3.2", messages=messages ) The response content can be seen: print(response["response"])

OpenAI

The OpenAI models are easily accessible via the official site: https://chatgpt.com This guide outlines the core steps required to call OpenAI's API to build intelligent conversational systems or other generative AI applications.

Generate an API key: Create an OpenAI account if you don't have one by visiting: https://platform.openai.com/ and follow the instructions to create an API key. Once its showed, save it immediately because you will no longer see it. No one, except you have to use this API key. OpenAI asks for a minimum credit to use the API. The API calls will spend against this $5. You can add your credit balance to OpenAI at Settings > Billing: https://platform.openai.com/settings/organization/billing/overview Note: disable the automatic recharge!
Authentication: The API key authenticates your application with OpenAI's servers. Without it, you cannot access the service. Generate an API key from the OpenAI dashboard and store the key securely inside an .env file:

Claude

The Anthropic Claude models are easily accessible via their official site: https://claude.ai This guide outlines the core steps required to call Anthropic's API to build intelligent conversational systems or other generative AI applications.

Generate an API Key: To begin, create an account on the Anthropic platform if you don’t already have one: https://console.anthropic.com/ Once registered, follow the instructions to create an API key. Save the key immediately after it's displayed, as you won't be able to view it again. Keep it private and secure—only you should use this API key. Anthropic requires an active billing account to use the API. You can manage your billing information under Account Settings > Billing. Be aware of any usage limits or costs associated with the API.

Authentication: Your API key serves as a credential to authenticate requests to Anthropic's servers. Without it, you cannot access the service. Store your API key securely in an .env file for easy and safe access in your application: ANTHROPIC_API_KEY=your-api-key-here Use the dotenv library in Python to load the key programmatically: from dotenv import load_dotenv import os load_dotenv() api_key = os.getenv("ANTHROPIC_API_KEY")
Build the API Input: Anthropic’s API expects input messages to be structured in a conversational format, similar to other APIs. For example: system_message = "This is a system-level instruction."}, user_prompt = [ {"role": "user", "content": "user message goes here."} ]
Call the API: To interact with Claude, use the anthropic library (or HTTP requests if no SDK is available). Below is an example of querying the model: import anthropic client = anthropic.Client(api_key) result = claude.messages.create( model="claude-3-5-sonnet-20240620", system=system_message, messages=user_prompt ) The response content can be seen: print(message.content[0].text) Alternatively if we want to reproduce the typewriter animation of the response generation in our code, we can call: result = claude.messages.stream( ... ) and see the response as: with result as stream: for text in stream.text_stream: print(text, end="", flush=True)

Build the API Input: Gemini’s API expects input messages to be structured in a conversational format. For example: system_message = "This is a system-level instruction." user_prompt = [ {"role": "user", "content": "user message goes here."} ]
Call the API: To interact with Gemini, use the google.generativeai library. Below is an example of querying the model: import google.generativeai as genai gemini = genai.GenerativeModel( model_name='gemini-1.5-flash', system_instruction=system_message ) response = gemini.generate_content(user_prompt) The response can be seen: print(response.text) The typewriter effect can be obtained as: response = gemini.generate_stream(user_prompt) for chunk in response.text_stream: print(chunk, end="", flush=True)

User Interface

User interfaces (UIs) are essential for making AI models accessible to users, and frameworks like Gradio simplify the process of building interactive applications. Gradio is a Python library that allows developers to quickly create and deploy web- based UIs. import gradio as gr

Basic Chat Interface Let's consider a function that handle the call to an LLM, for example to GPT: def stream_gpt(prompt): messages = [ {"role": "system", "content": system_message}, {"role": "user", "content": prompt} ] stream = openai.ChatCompletion.create( model='gpt-4o-mini', messages=messages, stream=True ) result = "" for chunk in stream: result += chunk.choices[0].delta.content or "" yield result This function build the message in the OpenAI API format. The user prompt is passed as function argument, while a system prompt is given. The GPT model is called and the result is given with a typewriter effect. We can build an interface as follow: view = gr.Interface( fn=stream_gpt, inputs=[gr.Textbox(label="Your message:")], outputs=[gr.Markdown(label="Response:")], allow_flagging="never" ) view.launch() Components:
Function (fn): Specifies the backend function (stream_gpt) that processes user input and generates a response.
Inputs: A Textbox is used to accept the user's message. The label "Your message:" describes the purpose of the input field. The content of the text box will be the input of the function component (in this case the user prompt).
Outputs: A Markdown box displays the model’s response. Markdown allows formatted text, such as bold, italics, and links.
allow_flagging: This disables the built-in Gradio flagging keeping the UI clean. Launch:
view.launch() starts a local server where the interface can be accessed through a web browser.

Let's build an interface with a drop-down menu as additional input: view = gr.Interface( fn=stream_model, inputs=[ gr.Textbox(label="Your message:"), gr.Dropdown(["GPT", "Claude"], label="Select model", value="GPT") ], outputs=[gr.Markdown(label="Response:")], flagging_mode="never" ) view.launch() A Dropdown component is added, allowing the user to select between different models (e.g., GPT or Claude). The default value is set to "GPT."

AI Agents

LLM chatbots are remarkably efficient in conversations. For this reason they are perfectly suitable to build an AI assistant. They have to be:

Friendly
Mantain context during the conversation
Subject expertise It's fundamental to well define:
System prompt
Context inside our user prompt
Multi-shot prompting (if the past tokens are obtained from a conversation related to the topic, it's more probable that the future tokens will get the answer to our question)

Keep contest

One of the key feature to build an AI Assistant is to keep the context of the chat. The LLM has to know which exchange of questions and answers has been done during the chat in order to "remember" what has happened. With Gradio's ChatInterface, this process becomes streamlined, especially since recent updates allow Gradio to pass the conversation history in the OpenAI format directly, eliminating additional processing. In conversational AI, context is stored as a series of messages that represent the interaction between the user and the assistant. OpenAI expects this context in the following format: [ {"role": "system", "content": "system message here"}, {"role": "user", "content": "first user prompt here"}, {"role": "assistant", "content": "the assistant's response"}, {"role": "user", "content": "the new user prompt"}, ] The roles define the participant in the conversation:

System: Provides instructions or sets the model’s behavior.
User: Represents the user’s input.
Assistant: Contains the AI’s responses. To ensure that responses are context-aware, we:

Combine the system message with the conversation history.
Add the latest user message before sending the request to OpenAI. The history process in Gradio is taken very easily: def chat(message, history): # Combine system message, history, and latest user message messages = [ {"role": "system", "content": system_message} ] + history + [ {"role": "user", "content": message} ]