PEDRO Precedent Discovery Platform

Context

PEDRO, the Plataforma de Extracao e Descoberta de Precedentes dos Tribunais, was a national-scale AI initiative developed with CNJ and PNUD to systematize and expose qualified precedents from STJ and STF.

The work required more than NLP experimentation. It depended on turning legal qualification rules, domain ambiguity, and institutional integration needs into a usable technical system with governance and legal credibility.

My role

Worked as Data Scientist on the initiative while providing technical leadership to a multidisciplinary team.
Gathered requirements with legal and business stakeholders to define precedent qualification rules and platform goals.
Built RESTful APIs in Python and FastAPI to expose AI functionality and integrate with CNJ systems.
Designed unsupervised learning workflows for clustering and pattern discovery in judicial decisions.

Problem

Precedent discovery is difficult because legal relevance is not a simple keyword task. The team needed a way to group decisions, identify semantic patterns, and expose that intelligence through institutional systems used by CNJ.

The problem was also organizational. Legal experts needed defensible outputs aligned with jurisprudential rules, while engineering needed a workflow that could be tracked, versioned, and improved over time.

Architecture

The platform combined several layers:

corpus ingestion and preprocessing for judicial decisions
topic-modeling and semantic-similarity pipelines for precedent grouping
experiment tracking and data versioning for reproducibility
FastAPI services for institutional integration
shared review loops between legal stakeholders and the technical team
exploratory analysis and corpus validation to keep the legal dataset usable

The key design decision was to keep experimentation governed. That made it easier to move from exploratory NLP work toward outputs that could support real institutional use instead of becoming a research artifact detached from CNJ operations.

Challenges

Legal categories are subtle and often depend on domain interpretation, not simple labeling.
Discovery systems need to surface useful patterns without overstating confidence.
Cross-functional delivery is only credible when legal and technical teams stay aligned on what the outputs mean.
National institutional projects need cleaner governance than a typical R and D prototype.

Solution

I built the system around a combination of unsupervised NLP, semantic grouping, and integration-ready APIs. That made the work useful at two levels: analysts could explore precedent groupings, and the technical team could keep experiments reproducible, inspectable, and governable.

A second important decision was to work closely with legal and business stakeholders during definition and iteration. That prevented the project from drifting into technically interesting but operationally irrelevant outputs.

Impact

Identified more than 30 precedent categories through semantic discovery workflows.
Expanded CNJ’s analytical capabilities by integrating AI outputs with judicial data systems.
Helped bridge legal and technical teams so the platform aligned with jurisprudential rules instead of staying as an isolated research prototype.