Top 20 Questions for Data Analysts at JPMorgan Chase

  • Posted Date: 23 Dec 2025
  • Updated Date: 23 Dec 2025

Image Description

 

Landing a data analyst role at JPMorgan Chase is no small feat. It's one of the biggest financial institutions globally, and they're looking for sharp analytical minds who can turn data into actionable insights. The interview process typically covers technical skills, business understanding, and how you think through problems.

 

JPMorgan's interview questions blend SQL and data manipulation skills with finance industry knowledge and behavioral scenarios. They want to see that you can handle the technical stuff while understanding the business context behind the data.

 

Let's dive into the most common questions you might face, with sample answers that'll help you prep effectively.

 

Technical SQL Questions

1. Write a SQL query to find the second highest salary from an Employee table.

Sample Answer: I'd use a subquery approach for this. Here's the query:

 

SELECT MAX(salary) AS second_highest_salary
FROM Employee
WHERE salary < (SELECT MAX(salary) FROM Employee);

 

Alternatively, I could use the DENSE_RANK function for a more scalable solution that works for nth highest:

 

SELECT salary
FROM (
    SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rank
    FROM Employee
) ranked
WHERE rank = 2;

 

The DENSE_RANK approach is better if there are duplicate salaries or if the requirement changes to find the third or fourth highest."

 

2. How would you identify duplicate records in a customer transactions table?

Sample Answer: I'd approach this by grouping on the fields that should be unique and counting occurrences. Here's the query:

 

SELECT customer_id, transaction_date, amount, COUNT(*) as duplicate_count
FROM Transactions
GROUP BY customer_id, transaction_date, amount
HAVING COUNT(*) > 1;

 

This shows me which combinations appear multiple times. If I needed to see all the duplicate rows with their complete details, I'd use a window function:


SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id, transaction_date, amount ORDER BY transaction_id) as row_num
    FROM Transactions
) t
WHERE row_num > 1;

 

This way I can review the actual duplicate records and decide how to handle them.

 

3. Explain the difference between INNER JOIN and LEFT JOIN with an example relevant to banking.

Sample Answer: An INNER JOIN returns only matching records from both tables, while a LEFT JOIN returns all records from the left table and matching records from the right table.

 

In banking context: imagine you have a Customers table and an Accounts table. If you use INNER JOIN, you'll only see customers who have accounts. But with LEFT JOIN, you'll see all customers, including those without accounts - which is crucial for identifying potential sales opportunities.

 

-- This shows only customers with accounts
SELECT c.customer_id, c.name, a.account_number
FROM Customers c
INNER JOIN Accounts a ON c.customer_id = a.customer_id;

 

-- This shows all customers, marking those without accounts as NULL
SELECT c.customer_id, c.name, a.account_number
FROM Customers c
LEFT JOIN Accounts a ON c.customer_id = a.customer_id;

 

At JPMorgan, you'd often use LEFT JOIN to analyze customer coverage or identify cross-selling opportunities.

 

Python and Data Manipulation Questions

4. How would you handle missing values in a dataset using Python?

Sample Answer: It depends on the context and the amount of missing data. For numerical columns, I might use mean or median imputation:

 

import pandas as pd
df['column_name'].fillna(df['column_name'].median(), inplace=True)

 

For categorical data, I'd use mode or create a 'Missing' category:

df['category_column'].fillna(df['category_column'].mode()[0], inplace=True)

 

If missingness is significant - say over 30% - I'd investigate why the data is missing first. Sometimes the absence of data is meaningful. In financial data, missing transaction amounts versus missing customer demographics require different approaches.

 

I might also use forward-fill or backward-fill for time-series financial data where values carry forward logically.

 

5. Explain how you'd detect outliers in transaction data.

Sample Answer: "I typically use multiple methods. The IQR method is straightforward:

 

Q1 = df['amount'].quantile(0.25)
Q3 = df['amount'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

 

outliers = df[(df['amount'] < lower_bound) | (df['amount'] > upper_bound)]

 

For banking data, I'd also use Z-score for normally distributed data:

 

from scipy import stats
df['z_score'] = stats.zscore(df['amount'])
outliers = df[abs(df['z_score']) > 3]

 

But here's the thing - in financial data, outliers aren't always errors. A large wire transfer might be legitimate. So I'd also look at business context, flag suspicious patterns, and collaborate with fraud detection teams before removing anything.

 

6. How would you merge two dataframes with different structures?

Sample Answer: I'd first understand the relationship between the datasets. For banking data, I might merge customer info with transaction history:

 

merged_df = pd.merge(customers_df, transactions_df, on='customer_id', how='left')

 

If the structures are really different - like different column names for the same data - I'd standardize first:

 

# Rename columns to match
transactions_df.rename(columns={'cust_id': 'customer_id'}, inplace=True)

 

# Then merge
merged_df = pd.merge(customers_df, transactions_df, on='customer_id', how='left')

 

I'd also check for data type mismatches and handle them before merging to avoid silent failures. In banking, sometimes account numbers are stored as integers in one system and strings in another.

 

Statistics and Analytics Questions

7. What's the difference between correlation and causation? Give a banking example.

Sample Answer: "Correlation means two variables move together, but causation means one actually causes the other to change.

 

Banking example: We might notice that customers with higher account balances also have more credit cards. There's correlation - but the high balance doesn't cause them to get more cards. More likely, both are caused by higher income or better financial management.

 

This distinction is crucial at JPMorgan. If we see that customers who use mobile banking have lower default rates, we can't assume mobile banking prevents defaults. Maybe financially responsible people just prefer digital tools. We'd need controlled experiments or more sophisticated causal inference methods to establish causation.

 

Acting on correlation alone could lead to bad business decisions, like over-investing in features that don't actually drive the outcomes we want."

 

8. How would you approach A/B testing for a new feature in the JPMorgan mobile app?

Sample Answer: First, I'd define clear success metrics - maybe monthly active users, feature adoption rate, or customer satisfaction scores.

 

Then I'd randomly split users into control and test groups, ensuring they're statistically similar. For a bank, I'd stratify by customer segment to ensure we're not accidentally putting all high-value customers in one group.

 

# Sample stratification approach
from sklearn.model_selection import train_test_split

control, test = train_test_split(users_df, test_size=0.5, 
                                  stratify=users_df['customer_segment'])

 

I'd run the test long enough to account for weekly patterns - maybe 2-4 weeks. Then I'd use statistical tests to compare results:

 

from scipy import stats
t_stat, p_value = stats.ttest_ind(control['metric'], test['metric'])

 

If p-value < 0.05, we have significant results. But I'd also look at practical significance - is the improvement meaningful business-wise, not just statistically?

 

9. Explain p-value in simple terms and why it matters in financial analysis.

Sample Answer: "A p-value tells you the probability that your results happened by random chance. If the p-value is less than 0.05, there's less than 5% chance your finding is just luck.

 

In banking, this matters hugely. Say we test a new credit scoring model that appears to reduce default rates by 2%. A low p-value (like 0.01) means we can be confident this improvement is real, not random noise. A high p-value (like 0.30) means we can't trust the result - it might disappear when we deploy it.

 

At JPMorgan, we're dealing with millions of dollars in decisions. We can't afford to act on findings that are just statistical flukes. P-values help us separate signal from noise, though I'd also look at effect size and business impact, not just statistical significance."

 

Business and Finance Questions

10. How would you analyze customer churn for credit card holders?

Sample Answer: I'd start by defining churn - is it account closure, zero activity for 90 days, or something else? Then I'd look at multiple angles:

 

First, cohort analysis to see if churn rates differ by acquisition period or customer segment. Then behavioral analysis - comparing transaction patterns, payment history, customer service interactions, and product usage between churned and retained customers.

 

# Feature engineering for churn analysis
df['avg_monthly_spend'] = df['total_spend'] / df['months_active']
df['missed_payments'] = df['payments_due'] - df['payments_made']
df['days_since_last_transaction'] = (today - df['last_transaction_date']).dt.days

 

I'd use logistic regression or a decision tree to identify churn drivers:

 

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
feature_importance = pd.DataFrame({
    'feature': X_train.columns,
    'coefficient': model.coef_[0]
}).sort_values('coefficient', key=abs, ascending=False)

 

Finally, I'd segment high-risk customers and recommend targeted retention strategies. At JPMorgan, maybe high-value customers get retention offers before they actually churn.

 

11. What metrics would you track to measure the success of a new loan product?

Sample Answer: "I'd track metrics across the entire customer journey:

 

Acquisition: Application volume, approval rate, time-to-decision, cost per acquisition.

 

Product performance: Loan origination volume, average loan amount, interest rate distribution, loan-to-value ratios.

 

Risk metrics: Default rate, delinquency rate at 30/60/90 days, charge-off rate, provision for credit losses.

 

Profitability: Net interest margin, return on assets, customer lifetime value, break-even timeline.

 

Customer experience: Application abandonment rate, NPS scores, cross-sell success rate, retention rate.

 

I'd create a dashboard tracking these weekly and comparing against targets and benchmarks. For JPMorgan, I'd also segment by customer type, geography, and loan purpose to understand what's working where. The key is balancing growth with risk—rapid loan growth means nothing if default rates spike."

 

12. How would you identify potential fraud in transaction data?

Sample Answer: I'd use multiple detection methods. First, rule-based flagging for obvious red flags:

 

  • Transactions over certain thresholds

  • Multiple transactions just under reporting limits

  • Unusual geographic patterns (card used in two distant locations within hours)

  • Velocity checks (many transactions in short time)

 

Then statistical anomaly detection:

 

# Unusual spending pattern
customer_avg = df.groupby('customer_id')['amount'].mean()
customer_std = df.groupby('customer_id')['amount'].std()
df['z_score'] = (df['amount'] - customer_avg) / customer_std
suspicious = df[abs(df['z_score']) > 3]

 

I'd also use machine learning models trained on historical fraud cases:

 

from sklearn.ensemble import RandomForestClassifier

# Features like transaction amount, time, merchant category, location
model = RandomForestClassifier()
model.fit(X_train, y_train_fraud_labels)
fraud_probability = model.predict_proba(X_test)

 

The key is minimizing false positives - you don't want to freeze legitimate transactions. At JPMorgan scale, even a 1% false positive rate affects thousands of customers daily.

 

Problem-Solving and Case Questions

13. If you noticed a sudden 20% drop in mobile app logins, how would you investigate?

Sample Answer: "I'd approach this systematically. First, verify the data is correct - check for tracking issues, data pipeline problems, or definition changes.

 

Then segment the drop: Is it across all users or specific groups? Which platforms (iOS vs Android)? Which geographic regions? New vs existing users?

 

# Segmentation analysis
login_analysis = df.groupby(['user_type', 'platform', 'region'])['logins'].agg(['count', 'mean'])
comparison = login_analysis.pct_change()

 

I'd check for external factors: Was there an app update? Marketing campaign changes? Competitor actions? Seasonal patterns? Technical issues?

 

I'd also look at related metrics: Did app downloads drop? Are users still active through web? Did customer service calls spike?

 

Finally, I'd formulate hypotheses and test them. Maybe the latest iOS update broke something. Maybe a new security feature is too cumbersome. At JPMorgan, I'd loop in product, tech, and UX teams to investigate further based on my findings.

 

14. How would you prioritize which customer segments to target for a new investment product?

Sample Answer: I'd use a scoring framework considering multiple factors:

 

Market size: How many customers fit the segment? What's the total potential revenue?

 

Profitability: What's the expected margin and lifetime value from this segment?

 

Fit: Does our product genuinely solve their problems? Do we have credibility in this space?

 

Accessibility: Can we reach them cost-effectively? Do we have existing relationships?

 

Competition: How saturated is this segment? Can we differentiate?

 

I'd create a scoring matrix:

segments = pd.DataFrame({
    'segment': ['Young Professionals', 'Affluent Retirees', 'Small Business Owners'],
    'market_size': [8, 6, 5],
    'profitability': [6, 9, 7],
    'product_fit': [7, 8, 6],
    'accessibility': [9, 5, 6],
    'competition': [5, 4, 6]
})

 

segments['total_score'] = segments.iloc[:, 1:].sum(axis=1)
segments.sort_values('total_score', ascending=False)

 

At JPMorgan, I'd also consider strategic priorities—maybe we're underrepresented in small business banking and want to grow there, even if affluent retirees score higher on pure metrics.

 

15. Explain how you'd calculate customer lifetime value (CLV) for banking customers.

Sample Answer: CLV represents the total profit we expect from a customer over their entire relationship with us.

 

Basic formula: CLV = (Average Annual Revenue per Customer × Customer Lifespan) - Acquisition Cost

 

For banking, it's more nuanced:

# Simplified CLV calculation
def calculate_clv(customer_data):
    # Revenue streams
    interest_income = customer_data['avg_loan_balance'] * customer_data['interest_rate']
    fee_income = customer_data['annual_fees']
    interchange_income = customer_data['card_spend'] * 0.02  # rough interchange rate
    
    total_annual_revenue = interest_income + fee_income + interchange_income
    
    # Costs
    servicing_cost = 100  # annual cost to maintain customer
    default_cost = customer_data['default_probability'] * customer_data['avg_loan_balance']
    
    net_annual_profit = total_annual_revenue - servicing_cost - default_cost
    
    # Project over expected lifespan
    avg_customer_lifespan = 7  # years
    discount_rate = 0.10
    
    clv = sum([net_annual_profit / ((1 + discount_rate) ** year) 
               for year in range(1, avg_customer_lifespan + 1)])
    
    return clv

 

At JPMorgan, I'd segment CLV by customer type and use it to inform acquisition spending, retention investment, and product development priorities."

 

Behavioral and Situational Questions

16. Tell me about a time you found an error in data that others missed.

Sample Answer: "In my previous role, I was analyzing customer transaction patterns and noticed the monthly totals looked right, but something felt off about the daily distributions. Everyone else had approved the report.

 

I dug deeper and found that weekend transactions were being double-counted due to a timezone conversion issue in our ETL pipeline. The monthly totals appeared correct because the duplicates roughly balanced out, but our day-of-week analysis was completely wrong.

 

I documented the issue, quantified the impact, and worked with the data engineering team to fix the pipeline. We also implemented validation checks to catch similar issues automatically.

 

This taught me to trust my instincts when something doesn't look right, even if it passes surface-level checks. At a place like JPMorgan where decisions involve millions of dollars, data accuracy isn't just important - it's critical."

 

17. How do you explain complex data findings to non-technical stakeholders?

Sample Answer: "I focus on the 'so what' rather than the 'how.' Executives don't need to know about my SQL queries - they need to know what the data means for the business.

 

I use clear visualizations and avoid jargon. Instead of saying 'the p-value is 0.03,' I'd say 'we're 97% confident this change will improve results.'

 

I also tell stories. When I presented credit risk analysis, I didn't just show default rates - I walked through a typical customer journey showing where risk signals appeared and when we could intervene.

 

At JPMorgan, I'd tailor my communication to the audience. A risk committee wants different details than a product team. I always lead with the recommendation, support it with data, and keep supporting details available for those who want to dig deeper."

 

18. Describe a situation where your analysis led to a business decision.

Sample Answer: "I analyzed customer complaint data and found that 40% of complaints came from just two specific issues that we could easily fix. The prevailing belief was that we had lots of small problems, but the data showed we had two big ones.

 

I presented this with specific examples and cost estimates for fixes versus the value of reducing complaints. Leadership allocated resources to address both issues within the quarter.

 

Three months later, complaint volume dropped 35%, and customer satisfaction scores improved significantly. The analysis didn't just identify problems - it prioritized them and made the business case for action.

 

At JPMorgan, I'd bring this same approach: use data to challenge assumptions, quantify business impact, and make clear recommendations that leaders can act on."

 

Technical Tools and Platforms Questions

19. What data visualization tools have you used, and how would you choose between them?

Sample Answer: "I've worked with Tableau, Power BI, Python (matplotlib, seaborn, plotly), and Excel depending on the use case.

 

For executive dashboards that need to be interactive and updated automatically, I prefer Tableau or Power BI. They're user-friendly for stakeholders and handle large datasets well.

 

For exploratory analysis or custom visualizations, I use Python. It gives me more control and integrates well with my analysis pipeline:

 

import matplotlib.pyplot as plt
import seaborn as sns

# Quick distribution check
sns.histplot(data=df, x='transaction_amount', hue='customer_segment')
plt.title('Transaction Distribution by Customer Segment')
plt.show()

 

For JPMorgan specifically, I'd ask what's already in the tech stack. Adoption matters - the best tool is the one people will actually use. If the risk team lives in Tableau, I'd build their dashboards there rather than forcing them to learn something new.

 

20. How do you ensure data quality and accuracy in your analyses?

Sample Answer: I follow a consistent validation process. First, I profile the data - check for nulls, duplicates, outliers, and unexpected values:

 

# Data quality checks
print(df.info())
print(df.describe())
print(df.duplicated().sum())
print(df.isnull().sum())

 

Then I validate against known totals or control figures. If I'm analyzing transactions, do my totals match the general ledger? Do customer counts match CRM records?

 

I also do sanity checks - if average account balance suddenly doubled, that's probably a data issue, not a real trend.

 

I document my assumptions and transformations so others can review my logic. I also build in automated alerts for anomalies.

 

At JPMorgan, where data quality directly impacts risk management and regulatory reporting, I'd advocate for strong data governance practices and wouldn't hesitate to flag quality issues before they reach decision-makers. Better to delay a report than present wrong information.

 

Final Tips for Your JPMorgan Interview

Prepare to explain your thinking process, not just give answers. Interviewers want to see how you approach problems, not whether you've memorized solutions.

 

Relate everything back to business impact. JPMorgan cares about data that drives decisions and improves outcomes, not analysis for analysis's sake.

 

Be ready to discuss the financial industry specifically. Show you understand banking operations, regulatory requirements, and customer needs.

 

Practice coding questions on platforms like LeetCode and StrataScratch. JPMorgan often includes live coding exercises.

 

Most importantly, be yourself. They're hiring humans, not robots. Show your enthusiasm for the role and your curiosity about the business. Good luck!

 

   

FAQs

Key tools include SQL, Python, R, and Tableau. Proficiency in these tools is essential for data manipulation, analysis, and visualization in a data analyst role at JPMorgan Chase.

A typical day involves gathering, cleaning, and analyzing data, followed by presenting insights to business teams. Collaboration across departments is essential in making data-driven decisions at JPMorgan Chase.

JPMorgan Chase leverages data analysis to assess risks, identify trends, and optimize business strategies. Data-driven insights help guide financial forecasting, investment strategies, and operational improvements.

Highlight technical skills like SQL, Python, R, and Tableau. Include successful projects where your data analysis led to actionable insights, demonstrating how you contributed to business outcomes at JPMorgan Chase.

Focus on building strong technical skills in SQL, Python, and data visualization tools. Also, improve your ability to communicate data insights clearly to non-technical stakeholders to stand out during interviews.

Free Workshop
Share:

Jobs by Department

Jobs by Top Companies

Jobs in Demand

See More

Jobs by Top Cities

See More

Jobs by Countries