Introduction

Learning goals

  • develop your understanding of:
    • what LLMs are,
    • different types of “wild caught data” tasks that LLMs can help with,
    • how to use and check LLMs for specific data prepration tasks,
    • how to interact with LLMs in R using {ellmer}

Lecture outline

  • About Me!
  • About LLMs
  • Using LLMs for “Wild Caught Data” tasks
  • Using LLMs in R with {ellmer}

About Me!

Who am I?

  • 💱 Previously:
    • Economics at the University of Melbourne
    • Catching wild data for empirical researchers:
      • wikipedia entries, archival magazines, trade databases, satellite images, online retail prices…

What do I work on?

  • 📊 Research Interests
    • 🌰 Designing tools and workflows for wild caught data!
    • 🤖 Leveraging LLMs and genAI for data wrangling and cleaning
      • 🖇️ using LLMs to correct manual data entry errors – Current MBAT research internship!

About Large Language Models

Generative AI and LLMs

Generative AI refers to:

  • computer algorithms and systems
  • that can generate content such as text, images and sound
  • based on patterns learnt from existing data

What are Large Language Models?

LLMs are…

  • code writers?
  • encyclopedias?
  • assignment help?
  • translators?

We often understand tools by what they can do for us, not how they work.

Who makes LLMs?

LLM providers develop and offer access to large language models and systems

Today, we will see demos of Anthropic and OpenAI models.

Why are there so many different models?

LLM providers offer paid and free access to multiple models:

Different models are designed to be good at different things:

  • chain-of-thought reasoning vs. instruction following
  • multimodal support: images, audio, video AND text
  • multilingual processng: translation, content generation
  • specific domains: medicine, finance, legal

Model differentiation

Learn more about picking the right tool:

How can we interact with LLMs?