Making An AI-Powered Business Card Reader

Writing a business card reader can be tedious. I first tried using Tesseract, an open-source Optical Character Recognition (OCR) engine. But besides OCR, you still need to segment the images, do optical character recognition (OCR) then identify text patterns for names, title, phones, emails, etc and extract them. Quite a lot of work if you have a big stack of business cards to go through.

But thankfully, technology and especially AI has come a long way and we can use widely available AI engines such as OpenAI’s to make the job easier… unbelievably easier. In this post, we’ll go through the whole thing from start to finish.

Part 1. Setting Up Your OpenAI Account

The first thing you need to do, if you haven’t already, is to set up your OpenAI account.

Step 1. Sign up with OpenAI https://platform.openai.com/signup

Step 2. Set up payment

The OpenAI API is not free and to utilize the API, you need credits. Unlike before, OpenAI is no longer giving free trial credits. So you need to set up a payment method to buy credits. Go to Settings->Billing->Payment methods->Add payment Method and add your credit card.

Step 3. Buy credits

From Billing, choose Overview->Add to credit balance.

Don’t worry, $10 dollar can go a long way in helping you learn the OpenAI API.

Part 2: Creating Your API key

You will need an API key for your code to access the API. You can create one for each project. Or you can use one for all your projects.

Step 1. Create your API key

Go to Dashboard->API keys->Create new secret key

Step 2. Save your key

Put it somewhere safe and accessible or add to your .bashrc or .zshrc file.

export OPENAI_API_KEY="your_api_key_here"

Step 3. Export the key

If you’ve saved it on .bashrc

$ source .bashrc

or if you’ve saved it on .zshrc

% source .zshrc

or just run the export command from your shell prompt.

$ export OPENAI_API_KEY="your_api_key_here"

Part 3: The Data

They often say GIGO: garbage in, garbage out. So for best results, the data you’re going to feed your AI needs to be good. I got the stack of business cards and scanned them one by one using Microsoft Lens. You can, of course, use your favorite scanning app. What’s important is you got good lighting and a steady hand. Put all the files in one folder, e.g. images.

Part 4: The Code

Now for the fun part. The idea is pretty simple:

1. Get the folder containing the business card image files.
2. Go through each file in the folder
    2.1. Check if the file is an image and if so send to ChatGPT with the prompt "identify the contact's name, title, email address, mobile/phone number, company name, industry, and website and output in a comma-delimited string" You can tweak the prompt to get the results you want
    2.2. Append the result from ChatGPT to a CSV file named contacts.csv

For simplicity, I’ll assume you already have node and npm set up. If not, you can find the info here and here.

Step 1. Create a new directory

$ mkdir business-card-scanner
$ cd business-card-scanner

Step 2. Initialize a new node.js project

$ npm init -y

Step 3. Install required packages

$ npm install openai fs path

Step 4. Write the code, I called mine processCards.js. Or you can get it on the GitHub repo.

// Import required packages
const OpenAI = require("openai");
const fs = require("fs").promises;
const path = require("path");

// Instantiate OpenAI with API secret key 
const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
});

// Check if file is an image
function isImageFile(filePath) {
    const ext = path.extname(filePath).toLowerCase();
    return ['.jpg', '.jpeg', '.png', '.gif'].includes(ext); // Adjust extensions as needed
}

// Send to ChatGPT
async function processImageFile(filePath) {
    try {
        // Read the image content as base64 encoded string
        const imageBuffer = await fs.readFile(filePath);
        const base64Image = Buffer.from(imageBuffer).toString("base64");

        // Send the base64 encoded image content to ChatGPT
        const response = await openai.chat.completions.create({
            model: "gpt-4o-mini",
            messages: [
                {
                    role: "user",
                    content: [
                        { 
                            type: "text", 
                            text: "identify the contact's name, title, email address, phone number, company name, industry, and website and output in a comma-delimited string" },
                        {
                            type: "image_url", 
                            image_url: 
                            {
                                url: "data:image/jpeg;base64,"+base64Image
                            }
                        }
                    ],
                },
            ],
        });

        // console.log(response.choices[0]);
        content = response.choices[0].message.content;
        console.log(content);

        return content;
    } catch (error) {
        console.error("Error processing image:", error);
    }
}

// Append the result from ChatGPT to a CSV file
async function appendToFile(filePath, content) {
    try {
        await fs.appendFile(filePath, content, 'utf8');
        console.log('Content appended successfully!');
    } catch (err) {
        console.error('Error appending content:', err);
    }
}

// Go through each file in the folder
async function processFolder(folderPath) {
    try {
        const files = await fs.readdir(folderPath);

        for (const file of files) {
            const filePath = path.join(folderPath, file);

            if (isImageFile(filePath)) {
                const extractedData = await processImageFile(filePath);
                await appendToFile('contacts.csv', extractedData + '\n');
                console.log('Processed:', filePath);
            } else {
                console.warn('Skipping non-image file:', filePath);
            }
        }
    } catch (error) {
        console.error('Error processing folder:', error);
    }
}

// Get the folder containing the business card image files.
const folderPath = process.argv[2];

if (!folderPath) {
    console.error("Please provide a folder path as an argument.");
    process.exit(1);
}

processFolder(folderPath);

Step 5. Run the code

Run the code from your shell prompt:

$ node processCards images

If everything works out right, you should have a contacts.csv file that contains the business card information you want to get.

This is just a simple use of the OpenAI API. There are certainly more complex tasks you can do with the API and with AI in general and I hope this post helped you get started exploring.

Thank you for reading!

Jay Lagare

Making An AI-Powered Business Card Reader

Part 1. Setting Up Your OpenAI Account

Part 2: Creating Your API key

Part 3: The Data

Part 4: The Code

Related Posts

Customizing LLMs: Parameter-Efficient Fine-Tuning

Introduction To Langflow

Introduction To LangChain