All Projects
Enterprise·3 months·2023

Ground Truth NLP Extraction

ImpactClean data pipeline
Data QualityCleaned
LanguagesMulti
ModulesReusable
Ground Truth NLP Extraction

Overview

Developed an NLP pipeline to extract ground truth values from messy technician work records. The system cleans, translates, and processes text data to identify problem statements, work performed, and parts used.

The Challenge

Raw technician data was inconsistent, multilingual, and unstructured, making it impossible to use for training ML models or generating insights.

The Solution

Built a comprehensive NLP pipeline using Google Translations for language normalization, TextBlob and NLTK for text processing, and created reusable OOP-based modules for similar use cases.

Key Results

  • Extracted usable rows from raw unstructured data

  • Identified part combinations and work sequences

  • Created reusable NLP modules for similar scenarios

Tech Stack

Google Translate APITextBlobNLTKPythonPandasOOP Design Patterns

Categories

NLPTextBlobNLTKPython