Enterprise·3 months·2023

Ground Truth NLP Extraction

ImpactClean data pipeline

Data QualityCleaned

LanguagesMulti

ModulesReusable

Overview

Developed an NLP pipeline to extract ground truth values from messy technician work records. The system cleans, translates, and processes text data to identify problem statements, work performed, and parts used.

The Challenge

Raw technician data was inconsistent, multilingual, and unstructured, making it impossible to use for training ML models or generating insights.

The Solution

Built a comprehensive NLP pipeline using Google Translations for language normalization, TextBlob and NLTK for text processing, and created reusable OOP-based modules for similar use cases.

Key Results

Extracted usable rows from raw unstructured data
Identified part combinations and work sequences
Created reusable NLP modules for similar scenarios

Tech Stack

Google Translate APITextBlobNLTKPythonPandasOOP Design Patterns

Overview

The Challenge

The Solution

Key Results

Tech Stack

Categories