FlashLearn makes single-step text classification straightforward using the ClassificationSkill. In this example, we demonstrate how to classify IMDB movie reviews into “positive” and “negative” sentiments.
Below is a step-by-step code example:
import json
import os
from openai import OpenAI
from flashlearn.skills.classification import ClassificationSkill
from flashlearn.utils import imdb_reviews_50k
def main():
# Step 1: Setup your provider
# Uncomment and set your API key if needed:
# os.environ["OPENAI_API_KEY"] = 'YOUR API KEY'
# Example using the OpenAI client (you could also configure DeepSeek or another provider)
client = OpenAI()
# Step 2: Load sample data (a list of dictionaries with "review" and "sentiment" keys)
reviews = imdb_reviews_50k(sample=100)
# Step 3: Initialize the ClassificationSkill for sentiment analysis.
# Here, we specify the model, client, allowed categories, and the system prompt.
skill = ClassificationSkill(
model_name="gpt-4o-mini", # or use your preferred model
client=client,
categories=["positive", "negative"],
max_categories=1,
system_prompt="We want to classify short movie reviews by sentiment."
)
# Step 4: Prepare classification tasks:
# Remove the original sentiment key, retaining only the review text.
removed_sentiment = [{'review': x['review']} for x in reviews]
tasks = skill.create_tasks(removed_sentiment)
# Step 7: (Optional) Save tasks to a JSONL file for offline processing or auditing.
with open('tasks.jsonl', 'w') as jsonl_file:
for entry in tasks:
jsonl_file.write(json.dumps(entry) + '\n')
# Step 5: Execute classification tasks in real time.
# FlashLearn processes these tasks concurrently.
results = skill.run_tasks_in_parallel(tasks)
# Step 6: Map results back to input data and check accuracy.
correct = 0
for i, review in enumerate(reviews):
reviews[i]['category'] = results[str(i)]['categories']
if reviews[i]['sentiment'] == results[str(i)]['categories']:
correct += 1
print(f'Accuracy: {round(correct / len(reviews), 2)} %')
# Step 7: Save the final results to a JSONL file.
with open('results.jsonl', 'w') as jsonl_file:
for entry in reviews:
jsonl_file.write(json.dumps(entry) + '\n')
# Step 8: Save the skill configuration for future use.
skill.save("BinaryClassificationSkill.json")
if __name__ == "__main__":
main()
Summary #
- Setup the Provider: Configure your API credentials and select an LLM client (e.g., OpenAI).
- Load Data: Retrieve sample IMDB reviews, ensuring each record contains the keys “review” and “sentiment.”
- Initialize the ClassificationSkill: Define your classification settings such as the model to use, allowed sentiment categories, and system prompt.
- Prepare Tasks: Strip out the ground truth sentiment and convert the data into tasks using
skill.create_tasks()
. - Execute and Evaluate: Process the tasks concurrently with
skill.run_tasks_in_parallel(tasks)
, then compare the predicted sentiment against the actual labels. - Save Results and Configuration: Output the result data to a JSONL file and persist your skill configuration.