Skip to main content Skip to complementary content

Processing a PDF file with Claude and extracting key information

The following Job aims at accessing a Claude API, using a Claude model to retrieve and process an existing PDF about mobile statistics, and extracting key information from the file in the console.

Before you begin

This scenario uses Claude model capabilities. For more information, read the corresponding Claude documentation.

Linking the components

Procedure

  1. Drag and drop the following components from the Palette: tClaudeAIClient, and tLogRow.
  2. Connect the components using a Row > FLOW connection.
    Overview of the Job in the Studio.

Configuring the components

Procedure

  1. Double-click the tClaudeAIClient component to display its Component view.
  2. In the Basic settings view, enter your Claude API key and select the desired Claude model.
    In this scenario, the claude-3-7-sonnet-20250219 is used. For more information on the Claude models that support PDF files, read the corresponding Claude documentation.
    You can leave the Prompt field empty as it will be provided in the Advanced settings as a JSON code.
    Basic settings view of the tClaudeAIClient configuration.
  3. In the Advanced settings view, enter the API version of your choice, and configure the JSON request that will reference your PDF and provide the prompt.
    To do so, select Use raw JSON, and enter the following JSON in the Request body field:
    {
      "model": "{.config.model}",
      "max_tokens": 1024,
      "messages": [{
            "role": "user",
            "content": [{
                "type": "document",
                "source": {
                    "type": "url",
                    "url": "<url_to_pdf>"
                }
            },
            {
                "type": "text",
                "text": "What are the key findings in this document?"
            }]
        }]
    }

    Replace <url_to_pdf> with the actual URL address where the PDF file is stored.

    Advanced settings view of the tClaudeAIClient configuration.
    In this scenario, a PDF file about a report on mobile statistics including users, devices, OSes, and mobile habits, is used.
    Introductory text of the PDF file to be retrieved and summarized.
  4. Double-click the tLogRow component to display its Component view.
    Click Sync columns to retrieve the schema structure from the previous component if needed.
    In the Mode area, select Basic, and click Print content with log4j to display the Job result in the console.

Executing the Job

Procedure

  1. Press Ctrl + S to save your Job.
  2. Press F6 to execute it.

Results

The Run console displays the result of the JSON prompt that fetches the PDF data and extract its key information.
Execution console showing the key information extracted from the PDF provided

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!