Lead Software Engineer - DevOps / Production Support

JPMorganChase

1 day ago

Full-time

On-site

Houston, Texas, United States

Software / Technology / IT

Description

We have an opportunity to impact your career and provide an adventure where you can push the limits of what's possible.

As a Lead Software Engineer at JPMorgan Chase within the Commercial & Investment Banking - Markets Tech - Trading / Derivatives Execution Tech team, you are an integral part of an agile team that works to enhance, build, and deliver trusted market-leading technology products in a secure, stable, and scalable way. As a core technical contributor, you are responsible for conducting critical technology solutions across multiple technical areas within various business functions in support of the firm’s business objectives.

This position will support the reliability, performance, and operational integrity of electronic and equities trading systems, with a specific focus on FIX protocol connectivity. This role is hands-on and operations-oriented, partnering closely with trading, technology, and development teams to ensure stable order flow, rapid incident response, and disciplined change execution. The position emphasizes Python automation, Linux troubleshooting, and Grafana-based observability, with C++ exposure used primarily to investigate issues and collaborate effectively with Application

Job responsibilities

Executes creative software solutions, design, development, and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems
Provide daily production support for electronic trading platforms, including FIX sessions, connectivity health, and order/trade workflow stability
Monitor system health and trading-impacting signals using Grafana dashboards and alerting to improve visibility with latency, errors, throughput, and availability
Lead incident triage and restoration activities during service degradation, including structured troubleshooting, stakeholder communications, and post-incident follow-up
Perform root cause analysis on recurring issues and implement durable remediation, including runbook improvements, alert tuning, and operational automation
Develops secure high-quality production code with reviewing and debugging code by using Python scripts and tools for health checks, operational workflows, reporting, and environment validation (per user-provided role intent)
Drives team adoption of enterprise-authorized AI-assisted engineering practices within the work environment to improve code quality, delivery speed, and operational outcomes (e.g., AI-assisted code review / refactoring, test strategy acceleration, incident/root-cause analysis support), while establishing consistent validation standards (secure coding, peer review, automated testing) and promoting reuse of effective patterns across the team
Applies knowledge of tools within the Software Development Life Cycle toolchain, including enterprise-authorized AI-assisted development and automaton capabilities, to improve the value realized by automation
Troubleshoot Linux based systems using logs, process and resource diagnostics, and network-level checks relevant to connectivity and application behavior (per user-provided role intent)
Partner with development teams to investigate complex issues in trading components with read logs, traces, diagnostic output and the ability to interpret and discuss findings in contexts where components are implemented in C++
Adds to team culture of diversity, opportunity, inclusion, and respect

Required qualifications, capabilities, and skills

Formal training or certification on Software engineering concepts and 5+ years applied experience
Advanced in one or more programming language(s), framework(s) and tools (e.g., Python, C++, Linux, Grafana, etc.)
Demonstrated experience in DevOps, production support, SRE, or application support in a mission-critical environment, with accountability for uptime and incident execution
Practical understanding of the FIX protocol
Strong Linux troubleshooting capability, including log analysis, process/resource diagnostics, and command-line proficiency
Hands-on experience with AWS and Terraform (infrastructure as code), and familiarity/experience with Atlas and Copilot as part of the deployment and platform toolchain
Ability to collaborate effectively across trading, operations, and engineering teams, including clear incident communications under time pressure
Proficiency in automation and continuous delivery methods, with advanced understanding of agile methodologies such as CI/CD, Application Resiliency, and Security
Demonstrated experience leading effective use of approved AI-assisted software development tools (e.g., for coding, code review, test acceleration, troubleshooting) with the ability to set team expectations for validating AI outputs for correctness, performance, and security
Strong understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs/outputs, and adherence to resiliency and security expectations; experience coaching engineers on safe, compliant adoption within delivery practices
Demonstrated proficiency in software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.)

Preferred qualifications, capabilities, and skills

Knowledge in electronic trading or equities trading environments, including familiarity with order lifecycle concepts and trading-impacting incident patterns
Exposure to C++ sufficient to assist with investigation (e.g., reading stack traces, understanding logs and component behavior), without being a primary feature developer
Demonstrated proficiency in software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.)
Familiarity with incident management disciplines, including runbooks, post-incident reviews, alert quality management, and operational readiness practices
Basic networking knowledge relevant to troubleshooting connectivity and performance (e.g., TCP/IP behavior, port connectivity, latency sensitivity)

Apply now

Lead Software Engineer - DevOps / Production Support

More jobs

GPU Software Engineer (CUDA)

Bright Vision Technologies

Lead Software Engineer – Java/Back-End with AI

JPMorganChase