OSIRIS Legal-Fiscal AI Workflows

Context

OSIRIS was a research and development initiative supporting PGDF on legal-fiscal execution workflows. The goal was to automate internal steps, improve efficiency, and explore where machine learning and LLMs could reduce repetitive work in document-heavy institutional processes.

My role

Worked as Data Scientist on the initiative and translated business requirements into technical scope.
Developed RESTful APIs in Python and FastAPI to connect model outputs to PGDF systems.
Built and evaluated supervised, unsupervised, and semi-supervised models for fiscal classification and process optimization.
Designed active-learning and experimentation workflows for longer-term model adaptation.

Problem

Legal-fiscal operations combine messy text, changing procedures, and institutional systems that cannot tolerate brittle automation. The team needed machine learning that could improve internal flow without creating a hard-to-maintain research island.

That required practical model delivery, not just experimentation: reproducibility, data versioning, integration, and a plan for model behavior as the domain evolved.

Architecture

The OSIRIS workflow was built around:

preprocessing and feature-engineering pipelines for legal-fiscal data
supervised, unsupervised, and semi-supervised model experiments
REST APIs for production integration
dataset and experiment versioning through DVC and MLflow
active-learning loops to keep the system current
exploratory LLM workflows for fiscal-text interpretation
continuous improvements to data pipelines and training frameworks

The system was designed to support both present delivery needs and future model evolution.

Challenges

Fiscal legal text changes over time, which makes static models decay quickly.
Production adoption depends on integration quality as much as model quality.
LLM exploration in institutional environments needs a clear boundary between useful experimentation and premature rollout.
Internal legal-fiscal workflows need automation that remains explainable and maintainable over time.

Solution

I treated the project as a workflow problem first. The solution combined governed ML delivery, API integration, and active-learning design so the models could improve without becoming operationally fragile.

In parallel, I evaluated how LLMs could support fiscal-text interpretation while keeping the work anchored in real deployment constraints. That created a better base for future expansion without overselling early experiments.

Impact

Deployed model-backed APIs into PGDF internal systems.
Designed an active-learning loop for continuous improvement with lower manual relabeling burden.
Opened practical LLM paths for legal-fiscal document analysis while keeping delivery grounded in operational reality.