Context
OSIRIS was a research and development initiative supporting PGDF on legal-fiscal execution workflows. The goal was to automate internal steps, improve efficiency, and explore where machine learning and LLMs could reduce repetitive work in document-heavy institutional processes.
My role
- Worked as Data Scientist on the initiative and translated business requirements into technical scope.
- Developed RESTful APIs in Python and FastAPI to connect model outputs to PGDF systems.
- Built and evaluated supervised, unsupervised, and semi-supervised models for fiscal classification and process optimization.
- Designed active-learning and experimentation workflows for longer-term model adaptation.
Problem
Legal-fiscal operations combine messy text, changing procedures, and institutional systems that cannot tolerate brittle automation. The team needed machine learning that could improve internal flow without creating a hard-to-maintain research island.
That required practical model delivery, not just experimentation: reproducibility, data versioning, integration, and a plan for model behavior as the domain evolved.
Architecture
The OSIRIS workflow was built around:
- preprocessing and feature-engineering pipelines for legal-fiscal data
- supervised, unsupervised, and semi-supervised model experiments
- REST APIs for production integration
- dataset and experiment versioning through DVC and MLflow
- active-learning loops to keep the system current
- exploratory LLM workflows for fiscal-text interpretation
- continuous improvements to data pipelines and training frameworks
The system was designed to support both present delivery needs and future model evolution.
Challenges
- Fiscal legal text changes over time, which makes static models decay quickly.
- Production adoption depends on integration quality as much as model quality.
- LLM exploration in institutional environments needs a clear boundary between useful experimentation and premature rollout.
- Internal legal-fiscal workflows need automation that remains explainable and maintainable over time.
Solution
I treated the project as a workflow problem first. The solution combined governed ML delivery, API integration, and active-learning design so the models could improve without becoming operationally fragile.
In parallel, I evaluated how LLMs could support fiscal-text interpretation while keeping the work anchored in real deployment constraints. That created a better base for future expansion without overselling early experiments.
Impact
- Deployed model-backed APIs into PGDF internal systems.
- Designed an active-learning loop for continuous improvement with lower manual relabeling burden.
- Opened practical LLM paths for legal-fiscal document analysis while keeping delivery grounded in operational reality.