Data analysis is more than just numbers - it’s about turning raw data into insights. As an analyst, mastering tools like Excel and SQL can make your workflow faster, smarter, and more accurate.
This guide is for students and early-career analysts who want to go from "I know the basics" to "I'm the person the team comes to with data problems." Let's get into it.
Advanced Excel
If VLOOKUP is still your go-to lookup formula, it's time for an upgrade. Here are the Excel functions that separate analysts who get promoted from those who stay stuck.
1. XLOOKUP
XLOOKUP was introduced in 2019 and it makes VLOOKUP look like a toy. It searches both left and right, returns multiple columns, handles errors gracefully, and doesn't break when you insert columns.
Excel formula:
// Old way — brittle, column-order dependent
=VLOOKUP(A2, Sheet2!A:D, 3, FALSE)
// New way — flexible, readable, error-safe
=XLOOKUP(A2, Sheet2!A:A, Sheet2!C:C, "Not found")
// Return multiple columns at once
=XLOOKUP(A2, Products!A:A, Products!B:D, "N/A")
2. Dynamic array formulas
Dynamic arrays (Excel 365) let a single formula spill results into multiple cells automatically. No more copying formulas down hundreds of rows.
Excel formula
// UNIQUE — get distinct values from a list
=UNIQUE(B2:B500)
// FILTER — conditional extraction (like SQL WHERE)
=FILTER(A2:C500, C2:C500>10000, "No results")
// SORT + FILTER combined
=SORT(FILTER(A2:C500, B2:B500="North"), 3, -1)
// SEQUENCE — generate number/date arrays
=SEQUENCE(12, 1, DATE(2025,1,1), 30)
3. SUMPRODUCT
SUMPRODUCT is the formula that experienced analysts reach for when nothing else quite works. It handles multi-criteria calculations, weighted averages, and conditional counting all in one.
Excel formula:
// Count rows matching 2 conditions
=SUMPRODUCT((A2:A500="East")*(B2:B500="Q1"))
// Weighted average (avg score weighted by sample size)
=SUMPRODUCT(B2:B50, C2:C50) / SUM(C2:C50)
// Unique count (no helper column needed)
=SUMPRODUCT(1/COUNTIF(A2:A100, A2:A100))
Top Excel functions every analyst should know cold
- INDEX + MATCH - still more powerful than XLOOKUP in complex scenarios
- IFERROR / IFNA - clean error handling in any formula
- TEXT() - format dates and numbers as strings for reports
- LET() - define variables inside a formula to avoid repetition
- LAMBDA() - create your own custom reusable functions
4. Power Query
Power Query is Excel's built-in ETL tool. It connects to databases, web pages, folders, and APIs, transforms messy data, and refreshes automatically. If you're still copying and pasting data between sheets manually, Power Query will change your life.
Power Query M language
// Load, filter, and transform data pipeline
letSource = Excel.Workbook(File.Contents("sales_data.xlsx")), RawData = Source{[Name="Sheet1"]}[Data], Promoted = Table.PromoteHeaders(RawData), Filtered = Table.SelectRows(Promoted, each [Region] = "North" and [Revenue] > 5000), Cleaned = Table.RemoveColumns(Filtered, {"Temp_Col"})inCleaned
5. Pivot Tables
Most people use 20% of what Pivot Tables can do. The advanced moves: calculated fields, slicers connected to multiple pivots, grouping dates by quarter, and using GetPivotData for dynamic dashboard cells.
Pro Pivot Table tricks
- Right-click a date field → Group → select Month + Year for automatic date bucketing
- Add a Calculated Field for metrics like "Revenue per Unit" directly inside the pivot
- Connect 3–4 slicers to multiple pivot tables for an instant dashboard
- Use "Show Values As → % of Column Total" to instantly get distribution breakdowns
Advanced SQL
Basic SQL is SELECT, WHERE, and GROUP BY. Every bootcamp teaches that. What separates strong analysts is comfort with joins, subqueries, window functions, and CTEs - the stuff that actually gets the complex questions answered.
1. CTEs
Common Table Expressions (CTEs) are essentially named subqueries that you define at the top of your query. They make complex logic readable, reusable, and debuggable. Interviewers love seeing them.
SQL - PostgreSQL / BigQuery / MySQL 8+
-- Find customers who spent more than
average in Q1 WITH q1_spend AS ( SELECT customer_id, SUM(order_value) AS total_spend FROM orders WHERE order_date BETWEEN '2025-01-01' AND '2025-03-31' GROUP BY customer_id ), avg_spend AS ( SELECT AVG(total_spend) AS avg_q1 FROM q1_spend ) SELECT q.customer_id, q.total_spend, a.avg_q1, q.total_spend - a.avg_q1 AS above_average_by FROM q1_spend q, avg_spend a WHERE q.total_spend > a.avg_q1 ORDER BY q.total_spend DESC;
2. Window functions
Window functions let you calculate aggregates (like running totals, rankings, and lag comparisons) without collapsing your rows. Once you learn them, you'll wonder how you ever lived without them.
SQL - window functions
SELECT sale_date, region, revenue, -- Running total within each region SUM(revenue) OVER ( PARTITION BY region ORDER BY sale_date ) AS running_total, -- Rank by revenue within each region RANK() OVER ( PARTITION BY region ORDER BY revenue DESC ) AS revenue_rank, -- Compare to previous day's revenue LAG(revenue, 1) OVER ( PARTITION BY region ORDER BY sale_date ) AS prev_day_revenue, -- Day-over-day change % ROUND( (revenue - LAG(revenue,1) OVER (PARTITION BY region ORDER BY sale_date)) * 100.0 / NULLIF(LAG(revenue,1) OVER (PARTITION BY region ORDER BY sale_date), 0) , 2) AS pct_change FROM daily_sales ORDER BY region, sale_date;
6 window functions to know for any analytics interview
- ROW_NUMBER() - unique sequential number per partition
- RANK() / DENSE_RANK() - rankings with/without gaps for ties
- LAG() / LEAD() - access previous or next row's value
- SUM() OVER() - running totals and cumulative sums
- AVG() OVER() - rolling averages over a defined window
- NTILE(n) - bucket rows into n equal groups (quartiles, deciles)
3. Advanced JOINs and when to use each
| Join type | Returns | Classic use case |
| INNER JOIN | Matching rows only | Orders with valid customers |
| LEFT JOIN | All left + matching right | All customers, even those with no orders |
| FULL OUTER JOIN | All rows from both sides | Reconciliation reports |
| CROSS JOIN | Cartesian product | Generating all date × product combos |
| SELF JOIN | Table joined to itself | Employee → Manager hierarchy |
| ANTI JOIN | Left rows with no match | Customers who never ordered |
4. Performance optimization tips
Writing a query that works is one thing. Writing one that runs in 2 seconds instead of 2 minutes is what makes you valuable in production environments.
SQL performance tips
-- BAD: SELECT * pulls all columns — slow on wide tables SELECT * FROM orders WHERE status = 'shipped'; -- GOOD: Only select what you need SELECT order_id, customer_id, order_date, total FROM orders WHERE status = 'shipped'; -- BAD: Function on indexed column kills index usage WHERE YEAR(order_date) = 2025 -- GOOD: Range filter uses the index WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01' -- Use EXISTS instead of IN for large subqueries WHERE EXISTS (SELECT 1 FROM vip_customers v WHERE v.id = orders.customer_id)
Interview Questions You Will Actually Face
These are real questions from data analyst interviews at companies like Razorpay, Flipkart, Accenture, Deloitte, and mid-size startups. Know these cold.
SQL- What's the difference between WHERE and HAVING?
WHERE filters rows before aggregation. HAVING filters after. You can't use aggregate functions like SUM() in a WHERE clause — that's what HAVING is for.
SQL- Write a query to find the second highest salary in a table.
SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees) — or use DENSE_RANK() for cleaner handling of ties.
SQL- What is a CTE and when would you use it over a subquery?
A CTE (WITH clause) makes complex queries readable and reusable within the same query. Use CTEs when you need to reference the same derived table more than once, or when nesting subqueries becomes unreadable.
Excel- How is INDEX MATCH better than VLOOKUP?
INDEX MATCH can look to the left (VLOOKUP can't), doesn't break when you insert columns, and is faster on large datasets. It also returns a reference, not a value, making it more flexible in complex formulas.
Excel- What is Power Query and how have you used it?
Power Query is Excel's data transformation tool. Used for cleaning, reshaping, and combining data from multiple sources without formulas. Queries refresh automatically when source data changes.
FAQs
Excel handles small datasets and quick visualizations, while SQL is designed for large databases and complex queries. Learning both ensures analysts can clean, analyze, and visualize data efficiently, making them versatile in any business environment.
Formulas like INDEX-MATCH, IFERROR, TEXTJOIN, and dynamic arrays, along with pivot tables and Power Query, are essential. They help analysts summarize data, automate repetitive tasks, and generate insights quickly.
Window functions like ROW_NUMBER(), RANK(), SUM() OVER() allow analysts to calculate running totals, rank data, and detect trends directly in the database, reducing manual work and improving accuracy in reporting.
Yes. Mastering advanced Excel and SQL enhances efficiency and decision-making skills. Analysts with these skills are more likely to secure higher-paying roles, lead projects, and advance quickly in data-driven industries.
Tools like Power BI, Tableau, and Excel add-ins such as Power Pivot and Power Query help visualize data and automate workflows. Combining these tools boosts analysis speed, dashboard creation, and actionable reporting. /'


