FlashLearn puts structured, consistent JSON output at the center of every workflow. In this example classification workflow, we classify IMDB movie reviews into “positive” or “negative” sentiments. At every step—from data preparation to task orchestration, result storage, and subsequent chaining—the process adheres strictly to JSON. This approach minimizes ambiguity and simplifies debugging and integration.
Below is the complete code walkthrough:
import json
import os
from flashlearn.utils import imdb_reviews_50k
from flashlearn.skills import GeneralSkill
from flashlearn.skills.toolkit import ClassifyReviewSentiment
def main():
# Step 1: Load or generate your data (100 sample IMDB reviews)
os.environ["OPENAI_API_KEY"] = "API-KEY"
data = imdb_reviews_50k(sample=100)
# Step 2: Load the JSON definition of a skill (in dict format)
# Input here is dictionary you have to read the JSON yourself!
skill = GeneralSkill.load_skill(ClassifyReviewSentiment)
# (Optional) Step 3: Save the skill definition in JSON for later loading
# skill.save("BinaryClassificationSkill.json")
# Step 4: Convert data rows into JSON-based tasks
tasks = skill.create_tasks(data)
# Step 5: Save tasks to a JSONL file for offline processing or replay
with open('tasks.jsonl', 'w') as jsonl_file:
for entry in tasks:
jsonl_file.write(json.dumps(entry) + '\n')
# Step 6: Run tasks (in parallel by default)
results = skill.run_tasks_in_parallel(tasks)
# Step 7: Map results back to inputs and store them in another JSONL file
# Every output is strict JSON, keyed by task ID ("0", "1", etc.)
with open('sentiment_results.jsonl', 'w') as f:
for task_id, output in results.items():
input_json = data[int(task_id)]
input_json['result'] = output
f.write(json.dumps(input_json) + '\n')
# Step 8: Inspect or chain the JSON results as needed
print("Sample result:", results.get("0"))
if __name__ == "__main__":
main()
Walkthrough Explanation #
-
Data Loading:
The process starts with loading 100 sample IMDB reviews. These reviews are stored as a list of dictionaries, providing a predictable JSON structure. -
Skill Loading:
We load a pre-defined JSON skill usingGeneralSkill.load_skill(ClassifyReviewSentiment)
. The skill contains both the system prompt and the JSON-based function definitions, ensuring that the transformation follows strict schema rules. -
JSON Task Creation:
Each data item is converted into a task usingskill.create_tasks(data)
. The task is a JSON object that defines exactly how the API call should be made. This eliminates any ambiguity about the structure and expected output. -
Saving Tasks as JSONL:
The tasks are then stored in a JSON Lines (JSONL) file. Each line represents a task in JSON format, enabling easy review, audit, and even re-processing at a later time. -
Parallel Processing:
The tasks are executed in parallel withskill.run_tasks_in_parallel(tasks)
. The parallel execution is made possible by having a standard JSON contract for every task and response. -
Result Storage:
After execution, the results (which are equally JSON-structured) are merged with the corresponding input data. This merged output is then saved into another JSONL file, showcasing the complete flow from raw input to structured output. -
Chaining and Inspection:
Finally, the results are printed for immediate inspection. Because every step is JSON-based, these outputs can be easily mapped back into your pipeline, chained into additional processing steps, or stored for future analysis.
By enforcing JSON at every stage, FlashLearn guarantees that the entire workflow is transparent, auditable, and easily integrable with other systems. This “All JSON, ALL the Time” approach simplifies maintenance and scales effortlessly with your data processing needs.