Analyzing Market Concentration in Manhattan Post-COVID
Shortly after COVID-19, I saw a tweet that really stuck with me. It explained how someone used PPP loan data to spot local businesses ripe for acquisition. By digging into the payroll numbers filed with the SBA, he was able to piece together the financial health of nearby private companies. It was a clever and unexpected approach to uncovering business opportunities.
Motivated by that idea, I downloaded several gigabytes of PPP loan data. My plan was to tackle a technical challenge in Excel and get a clearer picture of Manhattan’s business landscape—a market I’ve always found fascinating. However, like many good ideas, it sat on my hard drive for a while until I finally decided to revisit it.
I decided to zero in on how crowded various industries are in Manhattan by using the Herfindahl-Hirschman Index (HHI). This index, a standard measure of market concentration, tells you whether a market is dominated by just a few firms or if it’s more competitive. Instead of using estimated revenue figures, I used self-reported payroll data from the PPP records to compute the HHI by NAICS code.
Methodology
For this analysis, I worked in a Jupyter notebook using Python. Here are the main steps I followed, along with some code snippets.
1. Data Loading and Cleaning
I started by importing the necessary libraries and loading only the essential columns from the PPP loan dataset. After that, I cleaned the data by dropping rows with missing values and filtering it to include only businesses based in New York City.
import pandas as pd
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
# Columns to use
columns = ['BorrowerName','NAICSCode', 'CurrentApprovalAmount', 'JobsReported', 'BorrowerCity']
df = pd.read_csv('public_150k_plus_230930.csv', usecols=columns)
# Drop missing values and filter for New York City
df = df.dropna()
df = df[df['BorrowerCity'] == 'New York'].reset_index(drop=True)
2. Data Transformation
Next, I converted the NAICS codes to strings so I could group them using their first two digits, and I calculated the average monthly payroll based on the SBA's guidelines.
# Convert NAICS to string and remove trailing decimals
df['NAICSCode'] = df['NAICSCode'].astype(str).str.strip('.0')
# Calculate average monthly payroll
df['Avg_Monthly_Payroll'] = df['CurrentApprovalAmount'] / 2.5
3. Calculating Market Share and HHI
To gauge market concentration, I calculated each business’s market share within its NAICS category and then computed the HHI by summing the squares of these market shares.
# Count firms per NAICS and merge
total_firm_per_naics = df.groupby('NAICSCode').size().reset_index(name='TotalFirm')
df = pd.merge(df, total_firm_per_naics, on='NAICSCode')
# Calculate total payroll per NAICS
total_payroll_per_naics = df.groupby('NAICSCode')['Avg_Monthly_Payroll'].sum().reset_index()
total_payroll_per_naics.columns = ['NAICSCode', 'TotalPayroll']
df = pd.merge(df, total_payroll_per_naics, on='NAICSCode')
# Calculate market share
df['MarketShare'] = (df['Avg_Monthly_Payroll'] / df['TotalPayroll']) * 100
# Calculate HHI
df['MarketShareSquared'] = df['MarketShare'] ** 2
hhi_per_naics = df.groupby('NAICSCode')['MarketShareSquared'].sum().reset_index()
hhi_per_naics.columns = ['NAICSCode', 'HHI']
df = pd.merge(df, hhi_per_naics, on='NAICSCode')
4. Categorizing Market Concentration
Based on the HHI values, I divided the industries into three groups: competitive, moderately concentrated, and highly concentrated.
# Categorize markets based on HHI
competitive = df[(df['HHI'] < 1500) & (df['TotalFirm'] > 10)]
moderate_concentration = df[(df['HHI'] >= 1500) & (df['HHI'] < 2500) & (df['TotalFirm'] > 10)]
highly_concentrated = df[(df['HHI'] >= 2500) & (df['TotalFirm'] > 10)]
Key Findings
Highly Concentrated Markets (HHI ≥ 2500)
- All Other Telecommunications
- Veterinary Services
- Services for the Elderly and Persons with Disabilities
These sectors are dominated by a small number of firms, which means there's little room for new players.
Competitive Markets (HHI < 1500)
- Full-Service Restaurants
- Offices of Lawyers
- Offices of Physicians (except Mental Health Specialists)
In these industries, there are many competitors, resulting in crowded markets.
Insights
Digging into the data uncovered some unexpected insights. I never really considered opening a restaurant before, and the numbers confirmed that it might not be the best idea given the fierce competition and slim profit margins. On the flip side, sectors like drywall contracting appeared promising due to lower competition and steady demand. While I’m not planning to become a drywall contractor, the idea of buying and optimizing an existing drywall business has crossed my mind.
Lessons Learned
This project wasn’t just about number crunching—it also helped me sharpen my technical and research skills while tackling a real-world problem. Entering a market is a complicated challenge, and there’s rarely a perfect formula. The data helped highlight which sectors are more likely to succeed and which might struggle.
It’s important to note that the dataset has its limitations. Some businesses have closed, and not every eligible business applied for a PPP loan. Still, the analysis provides a detailed look at market concentration in NYC and reveals emerging trends and opportunities.
For a deeper dive, you can check out my full analysis, including the Jupyter notebook and data files, on GitHub.