Innodata solves your toughest data engineering challenges using artificial intelligence and human expertise.

Senior Language Data Scientist – Search Specialization

Full TimeRemoteTeam 1,001-5,000H1B No SponsorCompany Site LinkedIn

Location

New Jersey

Posted

11 days ago

Salary

Not specified

Postgraduate Degree5 yrs expEnglishPandasPython

Job Description

• Lead long-term projects with high complexity and ambiguity from first discussion with the client to completion • Design/improve workflows to create data for AI/ML training and evaluation. • Design and refine search data annotation frameworks, including relevance judging guidelines • Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers • Assess and optimize search-specific evaluation approaches, including A/B testing frameworks, ranking metrics, and human evaluation studies for search result quality • Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance. • Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions, and executing them. • Contribute to establishing best practices and standards for generative AI development with customers and within the organization

Job Requirements

MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred
Ability to collaborate directly with technical stakeholders including senior project managers, data engineers, and research scientists.
Extensive experience working with search-specific language data (queries, documents, relevance judgments, intent labels) and designing human evaluation tasks, including multi-phase and complex workflows.
Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling.
Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face.
Proficiency in Python to handle / transform large datasets (e.g. pre- and postprocessing data, pandas) perform quantitative analyses visualize data (for example matplotlib, seaborn)
Deep understanding of data pipelines to support ML and NLP workflows.
Knowledge of efficient data collection, transformation, and storage.
Knowledge of data structures, algorithms, and data engineering principles.

Benefits

Health insurance
401(k) matching
Flexible work hours
Professional development opportunities

Related Categories

Data Scientist

Related Job Pages

Data Scientist Jobs in New Jersey Remote Full-time Jobs (US)Remote Python Jobs (US)More US Remote Jobs