Level 05 - Software / Data Operations Engineer
Abyss Solutions
Our Company:
At Abyss, our mission is to create innovative AI and robotics solutions, removing the need to put people at risk in dangerous jobs. From deep-sea rovers to our recent Moon to Mars Australian Government grant, and emerging technology in Agriculture, there is no challenge that Abyss will shy away from.
A global team collaborating across Australia, America, and Pakistan consists of passionate problem solvers who love working on cutting-edge technology while maintaining a focus on environmental impact, safety, and cost.
You’ll get to work on complex challenges with a team of experts in software engineering, machine learning, data processing, and robotics.
In short, you'll get to work with some absolute LEGENDS, have fun, and get to work on cutting-edge software, in a company that is rapidly growing and has a REAL impact on the world, not just pushing numbers around.
Position Summary
The Data Operations Engineer executes and supports daily data-processing workflows within the Software Operations team. This role investigates pipeline failures, identifies whether issues originate from data or system/flow bugs, collaborates with senior developers to resolve and learn debugging paths, and performs rapid scripting and transformation tasks (CSV, JSON, point-cloud) within a cloud and orchestration-tool environment (e.g., Prefect).
TKS Statements
Below are the discrete Task, Knowledge, and Skill statements aligned with the role.
Tasks (T)
- Run daily data-processing workflows — Execute production workflows each day, ensuring processing jobs start, complete, and deliver on time.
- Monitor workflow execution and detect anomalies — Observe job statuses, alerts, and logs to detect failures or unusual patterns.
- Investigate pipeline failures to isolate root cause — Determine if a failure is due to data quality, metadata inconsistency, transformation bug, or orchestration error.
- Collaborate with senior developers on debugging sessions — Shadow or join senior devs in root-cause analysis of complex failures, and apply remediation.
- Re-run failed jobs and validate corrected output — After fixes, restart jobs, confirm correct outputs, and log results.
- Perform ad-hoc scripting tasks for rapid turnaround (“BlackOps”) — Script transformations, cleanup or merging of CSV/JSON files and handle point-cloud file preparation as needed.
- Maintain operational logs, trackers, and documentation — Update task-tracking systems (e.g., ClickUp), workflow execution logs, and document debugging learnings for SOP updates.
- Operate and manage cloud-based compute and orchestration infrastructure — Use and maintain compute/storage resources in the cloud, work within orchestration tool flows (e.g., Prefect flows).
Knowledge (K)
- Knowledge of data-processing workflow concepts (ingestion, transformation, validation, output).
- Knowledge of pipeline orchestration tools and how tasks, triggers, and retries function (e.g., Prefect, Airflow).
- Knowledge of cloud compute and storage infrastructure (e.g., Google Cloud Platform, buckets, VMs, job scheduling).
- Knowledge of data formats and their characteristics (CSV, JSON, point-cloud files, metadata structures).
- Knowledge of debugging and root-cause-analysis methodologies (log inspection, stack trace interpretation, data vs code differentiation).
- Knowledge of version control workflows (e.g., Git) and how code changes affect production pipelines.
- Knowledge of operational workflow management (SOPs, monitoring, alerts, task-tracking tools).
Skills (S)
- Skill in executing and monitoring data-processing workflows via orchestration tools.
- Skill in interpreting logs, alerts, and job execution statuses to identify anomalies.
- Skill in diagnosing whether a failure is due to data issues (e.g., malformed input, missing metadata) or processing/flow bugs.
- Skill in writing and modifying Python scripts for data transformation (CSV/JSON) and point-cloud preparation.
- Skill in operating cloud-based compute/storage resources (launching jobs, managing buckets, handling permissions).
- Skill in collaborating with development and operations teams, communicating clearly about debugging outcomes and next steps.
- Skill in documenting operational flows, troubleshooting steps, and updating SOPs or workflow trackers.
Key Requirements
- Minimum of 4-5 years of experience in data operations, pipeline monitoring, or data engineering support.
- Proficiency in Python scripting for data manipulation.
- Experience working with CSV, JSON, and ideally point-cloud file formats.
- Familiarity with cloud platforms and data workflow orchestration tools (e.g., Prefect).
- Strong analytical and debugging mindset; able to distinguish data failures vs. code/flow failures.
- Excellent collaboration and communication skills; able to work with senior developers and operations teams.
- Comfortable in an operationally-focused, fast-paced environment with daily delivery demands.
Nice to Have
- Prior experience in 3D scanning, CAD/point-cloud workflows.
- Experience with other workflow tools (Airflow, Dagster) and CI/CD pipelines.
- Familiarity with task-tracking and project management tools (e.g., ClickUp).
Qualification
- Bachelor's or Master’s degree in Computer Science, Software Engineering, or a related field.