Context
PEDRO, the Plataforma de Extracao e Descoberta de Precedentes dos Tribunais, was a national-scale AI initiative developed with CNJ and PNUD to systematize and expose qualified precedents from STJ and STF.
The work required more than NLP experimentation. It depended on turning legal qualification rules, domain ambiguity, and institutional integration needs into a usable technical system with governance and legal credibility.
My role
- Worked as Data Scientist on the initiative while providing technical leadership to a multidisciplinary team.
- Gathered requirements with legal and business stakeholders to define precedent qualification rules and platform goals.
- Built RESTful APIs in Python and FastAPI to expose AI functionality and integrate with CNJ systems.
- Designed unsupervised learning workflows for clustering and pattern discovery in judicial decisions.
Problem
Precedent discovery is difficult because legal relevance is not a simple keyword task. The team needed a way to group decisions, identify semantic patterns, and expose that intelligence through institutional systems used by CNJ.
The problem was also organizational. Legal experts needed defensible outputs aligned with jurisprudential rules, while engineering needed a workflow that could be tracked, versioned, and improved over time.
Architecture
The platform combined several layers:
- corpus ingestion and preprocessing for judicial decisions
- topic-modeling and semantic-similarity pipelines for precedent grouping
- experiment tracking and data versioning for reproducibility
- FastAPI services for institutional integration
- shared review loops between legal stakeholders and the technical team
- exploratory analysis and corpus validation to keep the legal dataset usable
The key design decision was to keep experimentation governed. That made it easier to move from exploratory NLP work toward outputs that could support real institutional use instead of becoming a research artifact detached from CNJ operations.
Challenges
- Legal categories are subtle and often depend on domain interpretation, not simple labeling.
- Discovery systems need to surface useful patterns without overstating confidence.
- Cross-functional delivery is only credible when legal and technical teams stay aligned on what the outputs mean.
- National institutional projects need cleaner governance than a typical R and D prototype.
Solution
I built the system around a combination of unsupervised NLP, semantic grouping, and integration-ready APIs. That made the work useful at two levels: analysts could explore precedent groupings, and the technical team could keep experiments reproducible, inspectable, and governable.
A second important decision was to work closely with legal and business stakeholders during definition and iteration. That prevented the project from drifting into technically interesting but operationally irrelevant outputs.
Impact
- Identified more than 30 precedent categories through semantic discovery workflows.
- Expanded CNJ’s analytical capabilities by integrating AI outputs with judicial data systems.
- Helped bridge legal and technical teams so the platform aligned with jurisprudential rules instead of staying as an isolated research prototype.
Useful links