MVP Parsing job descriptions and resume comparison¶

Using Selenium to scrape, BeautifulSoup to clean, Scapy to identify keywords / skills / experience / title, then compare between job description and a given resume. I turned the work below into a full pipeline for analyzing jobs in this repo if you would like to see an evolution of the code itself.

In [663]:
pip install beautifulsoup4 selenium spacy
Requirement already satisfied: beautifulsoup4 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (4.14.3)
Requirement already satisfied: selenium in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (4.39.0)
Requirement already satisfied: spacy in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (3.8.11)
Requirement already satisfied: soupsieve>=1.6.1 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from beautifulsoup4) (2.8)
Requirement already satisfied: typing-extensions>=4.0.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from beautifulsoup4) (4.15.0)
Requirement already satisfied: urllib3<3.0,>=2.5.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from urllib3[socks]<3.0,>=2.5.0->selenium) (2.5.0)
Requirement already satisfied: trio<1.0,>=0.31.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from selenium) (0.32.0)
Requirement already satisfied: trio-websocket<1.0,>=0.12.2 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from selenium) (0.12.2)
Requirement already satisfied: certifi>=2025.10.5 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from selenium) (2025.11.12)
Requirement already satisfied: websocket-client<2.0,>=1.8.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from selenium) (1.9.0)
Requirement already satisfied: attrs>=23.2.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from trio<1.0,>=0.31.0->selenium) (25.4.0)
Requirement already satisfied: sortedcontainers in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from trio<1.0,>=0.31.0->selenium) (2.4.0)
Requirement already satisfied: idna in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from trio<1.0,>=0.31.0->selenium) (3.11)
Requirement already satisfied: outcome in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from trio<1.0,>=0.31.0->selenium) (1.3.0.post0)
Requirement already satisfied: sniffio>=1.3.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from trio<1.0,>=0.31.0->selenium) (1.3.1)
Requirement already satisfied: exceptiongroup in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from trio<1.0,>=0.31.0->selenium) (1.3.1)
Requirement already satisfied: wsproto>=0.14 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from trio-websocket<1.0,>=0.12.2->selenium) (1.3.2)
Requirement already satisfied: pysocks!=1.5.7,<2.0,>=1.5.6 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from urllib3[socks]<3.0,>=2.5.0->selenium) (1.7.1)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (1.0.15)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (2.0.13)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (3.0.12)
Requirement already satisfied: thinc<8.4.0,>=8.3.4 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (8.3.10)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (1.1.3)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (2.5.2)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (2.0.10)
Requirement already satisfied: weasel<0.5.0,>=0.4.2 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (0.4.3)
Requirement already satisfied: typer-slim<1.0.0,>=0.3.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (0.20.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (4.67.1)
Requirement already satisfied: numpy>=1.19.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (2.2.6)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (2.32.5)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (2.12.5)
Requirement already satisfied: jinja2 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (3.1.6)
Requirement already satisfied: setuptools in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (80.9.0)
Requirement already satisfied: packaging>=20.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from spacy) (25.0)
Requirement already satisfied: annotated-types>=0.6.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (2.41.5)
Requirement already satisfied: typing-inspection>=0.4.2 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.4.2)
Requirement already satisfied: charset_normalizer<4,>=2 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.4.4)
Requirement already satisfied: blis<1.4.0,>=1.3.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from thinc<8.4.0,>=8.3.4->spacy) (1.3.3)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from thinc<8.4.0,>=8.3.4->spacy) (0.1.5)
Requirement already satisfied: click>=8.0.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from typer-slim<1.0.0,>=0.3.0->spacy) (8.3.1)
Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from weasel<0.5.0,>=0.4.2->spacy) (0.23.0)
Requirement already satisfied: smart-open<8.0.0,>=5.2.1 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from weasel<0.5.0,>=0.4.2->spacy) (7.5.0)
Requirement already satisfied: wrapt in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from smart-open<8.0.0,>=5.2.1->weasel<0.5.0,>=0.4.2->spacy) (2.0.1)
Requirement already satisfied: h11<1,>=0.16.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from wsproto>=0.14->trio-websocket<1.0,>=0.12.2->selenium) (0.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/samueldowds/.pyenv/versions/3.10.19/lib/python3.10/site-packages (from jinja2->spacy) (3.0.3)
Note: you may need to restart the kernel to use updated packages.
In [664]:
# sample jobs
job = "https://job-boards.greenhouse.io/webflow/jobs/7166510"

Fetching¶

I am going to use bs and selenium to fetch and parse the text. Selenium because some sites will 403 you without JS enabled.

In [665]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
driver = webdriver.Chrome(options=options)
driver.get(job)

html = driver.page_source
driver.quit()

Parsing¶

In [666]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
for tag in soup(["nav", "header", "footer", "script", "style", "a", "input", "label", "button"]):
    tag.decompose()  # removes the tag completely

text = soup.get_text(separator="\n")
lines = [line.strip() for line in text.splitlines() if line.strip()]
job_text = "\n".join(lines)

Processing¶

In [667]:
# pip install spacy
# !python -m spacy download en_core_web_lg
import spacy
import os
nlp = spacy.load("en_core_web_lg")

Skills¶

For identifying skills I use the EntityRuler from spacy. I pass through a jsonl file that has JOB_TITLE and SKILLS objects for patterns.

In [668]:
ruler = nlp.add_pipe("entity_ruler", before="ner")
ruler.from_disk("/Users/samueldowds/Desktop/patterns/patterns.jsonl") 
Out[668]:
<spacy.pipeline.entityruler.EntityRuler at 0x82176e7c0>
In [669]:
doc = nlp(job_text)
In [670]:
from spacy import displacy
displacy.render(doc, style="ent")
Job Application for ORG Senior Software Engineer JOB_TITLE , Webflow Cloud PERSON at Webflow ORG Senior Software Engineer JOB_TITLE , Webflow Cloud ORG
CA Remote ORG ( BC & ON ORG only); U.S. Remote GPE
At Webflow ORG , we’re building the world’s leading AI SKILL -native Digital ORG Experience Platform, and we’re doing it as a remote-first company built on trust, transparency, and a whole lot of creativity. This work takes grit, because we move fast, without ever sacrificing craft or quality. Our mission is to bring development superpowers to everyone. From entrepreneurs launching their first ORDINAL idea to global enterprises scaling their digital presence, we empower teams to design SKILL , launch, and optimize for the web without barriers. We believe the future of the web, and work, is more open, more creative, and more equitable. And we’re here to build it together.
We’re looking for a
Senior Software Engineer JOB_TITLE
to join our
Webflow Cloud ORG team
in our mission to enable Webflow ORG customers to design SKILL and build powerful websites. In this role, you'll be a key member of our Webflow Cloud ORG team, which is responsible for a new product that allows customers to deploy fully operational web apps, mounted on their Webflow ORG -hosted domains. You’ll be working in Webflow ORG ’s Designer Dashboard PERSON to integrate the new features, and in our infrastructure that enables these features.
About the role:
Location: Remote-first ( United States GPE ; BC & ON ORG , Canada GPE )
Full-time
Permanent
Exempt
The cash compensation for this role is tailored to align with the cost of labor in different geographic markets. We've structured the base pay ranges for this role into zones for our geographic markets, and the specific base pay within the range will be determined by the candidate’s geographic location, job-related experience, knowledge, qualifications, and skills.
United States GPE   (all figures cited below are in USD GPE and pertain to workers in the United States GPE )
Zone A: $150,100 - $ MONEY 207,100 MONEY
Zone B: $ 141,600 MONEY - $ 194,800 MONEY
Zone C SKILL : $ 132,100 MONEY - 182,400 CARDINAL
Canada GPE (figures cited below are in CAD ORG and pertain to workers in ON & BC ORG , Canada GPE )
Zone A: $171,000 - $ MONEY 235,600 MONEY
This role is also eligible to participate in Webflow ORG 's company-wide bonus program. Target amounts are a percentage of base salary and vary by career level. Payouts are based on company performance against established financial and operational goals.
Please visit our
for more information on which locations are included in each of our geographic pay zones. However, please confirm the zone for your specific location with your recruiter.
Reporting to the Engineering Manager JOB_TITLE
As a
Senior Software Engineer JOB_TITLE , Webflow Cloud PERSON
, you’ll …
Collaborate with designers, PMs ORG , and engineers to plan and build product capabilities that enable our ambitious visual development goals
Build, document, and test production code that impacts all Webflow ORG customers
Participate in all engineering SKILL activities including technical designing and implementation, testing SKILL and validation, releasing new capabilities, incident response, and interviewing
Solve problems in a highly technical platform that empowers hundreds of thousands CARDINAL of people
Tackle complex technical challenges on a collaborative and geographically distributed team
In addition to the responsibilities outlined above, at Webflow ORG we will support SKILL you in identifying where your interests and development opportunities lie and we'll help you incorporate them into your role.
About you:
Requirements:
BA/BS degree or equivalent experience
You’ll thrive as a
Senior Software Engineer JOB_TITLE , Webflow Cloud PERSON
if you:
Have 5+ years of experience EXPERIENCE shipping features and products, with a focus on complex full-stack applications
Possess an innate interest in web development tools, and their intersection with infrastructure engineering SKILL
Have experience using cloud platforms to build and deploy web applications
Have experimented with and shipped production-scaled code in several different frameworks
Love WORK_OF_ART thinking through large technical problems and working through that complexity on a collaborative, distributed team
Are comfortable building up a mental model of a product and architecture through reading code and debugging SKILL existing software SKILL
Can debug production issues across services and multiple levels of the stack
Take pride in taking ownership and driving projects to business SKILL impact
Deeply understand data design SKILL and modeling
Are familiar with Node.js SKILL , React SKILL , TypeScript SKILL , Pulumi GPE , GraphQL SKILL , Postgres, AWS ORG , Cloudflare SKILL Workers for Platforms, and server SKILL -rendering complex React SKILL applications at scale
Have consistently communicated trade-offs throughout a project to meet both technical and business SKILL requirements
Are comfortable working in an agile, safe-to-fail environment
Stay curious and open to growth — actively building fluency in emerging technologies like AI SKILL to unlock creativity, accelerate progress, and amplify impact.
Our Core Behaviors:
Build lasting customer trust.
We build trust by taking action that puts customer trust first ORDINAL .
Win together.
We play SKILL to win, and we win as one CARDINAL team. Success at Webflow ORG isn't a solo act.
Reinvent ourselves.
We don't just improve what exists, we imagine what's possible.
Deliver with speed, quality, and craft.
We move fast because the moment demands it, and we do so without lowering the bar.
Benefits
Ownership in what you help build.
Every permanent Webflower PERSON receives equity (RSUs) in our growing, privately held company.
Health coverage that actually covers you.
Comprehensive medical, dental, and vision plans for full-time employees and their dependents, with Webflow ORG covering most premiums.
Support SKILL for every stage of family life
. 12 weeks DATE of paid parental leave for all parents and 6+ weeks DATE of additional paid leave for birthing parents. Plus inclusive care for family planning, menopause, and midlife transitions.
Time ORG off that’s actually off.
Flexible vacation, paid holidays, and a sabbatical program to help you recharge and come back inspired.
Wellness for the whole you.
Access to mental health resources, therapy and coaching.
Invest in your future.
A 401(k) with 100% PERCENT employer match ( up to $6,000 MONEY /year) in the U.S. GPE , and support SKILL for retirement savings globally.
Monthly DATE stipends that flex with your life.
Localized support SKILL for work and wellness expenses — from Wi-Fi ORG to workouts.
Bonus for building together.
All full-time, permanent, non-commission employees are eligible for our annual DATE WIN bonus program.
Temporary employees may be eligible for paid holiday and time off, statutory leaves of absence, and company-sponsored medical benefits depending on their Fixed Term Contract ORG and their country/state of employment.
Remote, together
At Webflow ORG , equality is a core tenet of our culture. We are an Equal Opportunity (EEO)/Veterans/Disabled Employer and are
to building an inclusive global team that represents a variety of backgrounds, perspectives, beliefs, and experiences. Employment decisions are made on the basis of job-related criteria without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other classification protected by applicable law. Pursuant to the San Francisco Fair Chance Ordinance ORG , Webflow ORG will consider for employment qualified applicants with arrest and conviction records.
Stay connected
Not ready to apply, but want to be part of the Webflow ORG community? Consider following our story on our
,
,
, and/or
.
Please note:
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Upon interview scheduling, instructions for confidential accommodation requests will be administered.
To join Webflow ORG , you'll need a valid right to work authorization depending on the country of employment.
If you are extended an offer, that offer may be contingent upon your successful completion of a background check, which will be conducted in accordance with applicable laws. We may obtain one CARDINAL or more background screening reports about you, solely for employment purposes.
For information about how Webflow ORG processes your personal information, please review
.
Create a Job Alert
Interested in building your career at Webflow ORG ? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field
Phone
Resume/CV
*
Accepted file types: pdf, doc, docx, txt, rtf
Cover Letter
Accepted file types: pdf, doc, docx, txt, rtf
In [671]:
from collections import Counter

# verbs
verb_counts = Counter([token.lemma_ for token in doc if token.pos_ == "VERB"])
verbs_sorted = [verb for verb, count in verb_counts.most_common()]

# skills
skill_counts = Counter([ent.text for ent in doc.ents if ent.label_ == "SKILL"])
skills_sorted = [skill for skill, count in skill_counts.most_common()]

# salary
money_counts = Counter([ent.text for ent in doc.ents if ent.label_ == "MONEY"])
money_sorted = [amount for amount, count in money_counts.most_common()]

# job title
job_title_counts = Counter([ent.text for ent in doc.ents if ent.label_ == "JOB_TITLE"])
job_titles_sorted = [title for title, count in job_title_counts.most_common()]

# exprience
experience_count = Counter([ent.text for ent in doc.ents if ent.label_ == "EXPERIENCE"])
experience_sorted = [title for title, count in experience_count.most_common()]

Output¶

In [672]:
print(verbs_sorted)
['build', 'take', 'work', 'pay', 'enable', 'participate', 'help', 'win', 'do', 'move', 'scale', 'empower', 'design', 'join', 'deploy', 'relate', 'cite', 'include', 'distribute', 'debug', 'exist', 'receive', 'cover', 'depend', 'consider', 'apply', 'accept', 'lead', 'sacrifice', 'bring', 'launch', 'optimize', 'believe', '’re', 'look', 'allow', 'mount', 'host', 'integrate', 'tailor', 'align', 'structure', 'range', 'determine', 'pertain', 'vary', 'base', 'establish', 'visit', 'confirm', 'report', 'collaborate', 'plan', 'impact', 'release', 'interview', 'outline', 'support', 'identify', 'lie', 'incorporate', 'thrive', 'possess', 'have', 'use', 'experiment', 'ship', 'think', 'read', 'drive', 'understand', 'render', 'communicate', 'meet', 'fail', 'stay', 'emerge', 'unlock', 'accelerate', 'amplify', 'last', 'put', 'play', 'reinvent', 'improve', 'imagine', 'deliver', 'demand', 'lower', 'grow', 'hold', 'birth', '’', 'recharge', 'come', 'inspire', 'flex', 'localize', 'sponsor', 'represent', 'make', 'protect', 'connect', 'want', 'follow', 'note', 'ensure', 'provide', 'perform', 'administer', 'need', 'extend', 'conduct', 'obtain', 'process', 'review', 'create', 'get', 'send', 'indicate', 'require']
In [673]:
print(skills_sorted)
['support', 'AI', 'design', 'engineering', 'business', 'React', 'C', 'testing', 'debugging', 'software', 'data design', 'Node.js', 'TypeScript', 'GraphQL', 'Cloudflare', 'server', 'play', 'Support']
In [674]:
print(job_titles_sorted)
['Senior Software Engineer', 'Engineering Manager']
In [675]:
print(money_sorted)
['$150,100 - $', '207,100', '141,600', '194,800', '132,100', '$171,000 - $', '235,600', 'up to $6,000']
In [676]:
print(experience_sorted)
['Have 5+ years of experience']

I want to create skill clusters, to better catch outliers

In [677]:
# note: yes this was written by AI
def get_skill_clusters(doc, skill_label="SKILL", window=5):
    skills_positions = [(ent.text, ent.start) for ent in doc.ents if ent.label_ == skill_label]
    skills_positions.sort(key=lambda x: x[1])  # Sort by token position
    
    clusters = []
    current_cluster = []
    
    for i, (skill, pos) in enumerate(skills_positions):
        if not current_cluster:
            current_cluster.append((skill, pos))
        else:
            # Compare with last skill in current cluster
            last_pos = current_cluster[-1][1]
            if pos - last_pos <= window:
                current_cluster.append((skill, pos))
            else:
                # Only add cluster if it has more than 1 skill
                if len(current_cluster) > 1:
                    clusters.append([s for s, _ in current_cluster])
                current_cluster = [(skill, pos)]
    
    # Add last cluster if valid
    if len(current_cluster) > 1:
        clusters.append([s for s, _ in current_cluster])
    
    return clusters
In [678]:
clusters = get_skill_clusters(doc, window=10)
print(clusters)
[['engineering', 'testing'], ['debugging', 'software'], ['business', 'data design', 'Node.js', 'React', 'TypeScript', 'GraphQL', 'Cloudflare', 'server', 'React']]

Hmmm. Lets get verbs close to those skills to focus on!

In [679]:
# note: written by AI
def get_keyword_clusters(doc, clusters, window=5):
    cluster_verbs = []

    for cluster in clusters:
        # Get token positions of skills in this cluster
        positions = [ent.start for ent in doc.ents if ent.text in cluster]
        verbs_nearby = set()
        for token in doc:
            if token.pos_ == "VERB":
                # Check if verb is within window of any skill in cluster
                if any(abs(token.i - pos) <= window for pos in positions):
                    verbs_nearby.add(token.lemma_)  # use lemma for normalization
        cluster_verbs.append({"cluster": cluster, "verbs": list(verbs_nearby)})

    return cluster_verbs
In [680]:
keywords = get_keyword_clusters(doc, clusters, window=10)
print(keywords)
[{'cluster': ['engineering', 'testing'], 'verbs': ['deploy', 'build', 'participate', 'release', 'include', 'use', 'impact', 'have']}, {'cluster': ['debugging', 'software'], 'verbs': ['read', 'exist', 'debug']}, {'cluster': ['business', 'data design', 'Node.js', 'React', 'TypeScript', 'GraphQL', 'Cloudflare', 'server', 'React'], 'verbs': ['take', 'meet', 'understand', 'communicate', 'drive', 'work', 'render']}]
In [681]:
current_resume = """

John Doe
[email protected] ❖ github.com/johndoe-dev

WORK EXPERIENCE 

Google (Data Experience Group) 					          	      May – June 2025
Senior Software Engineer								     	      Remote
Architected a TypeScript MVP for an AI-driven data analysis tool integrated with Google BigQuery
Integrated a dbt MCP server written in Python into an LLM tool-calling system
Built a chat interface with streaming responses using Next.js and Vercel’s AI SDK
Developed a pixel-perfect dashboard from frontend to backend for custom KPI metrics
Created a design component library using react-charts, Radix UI, and Tailwind CSS
Designed data schemas for conversation history, KPI data, and custom model usage
Deployed database tables leveraging an existing Prisma schema and Supabase

Amazon Web Services (AWS Innovation Studio)				        December 2024 – April 2025
Freelance Software Engineer									              Remote
Designed and developed custom landing pages for early-stage internal launch initiatives
Implemented analytics and A/B testing with PostHog
Deployed static Next.js builds on Netlify for rapid prototyping
Built email-capture funnels using serverless functions
Collaborated with internal AWS teams and non-technical product stakeholders

Meta (Marketplace & Rentals Team) 					             Jan 2022 – December 2023
Full-Stack Engineer										            Seattle, WA
Built and launched a new property-management workflow for landlord tools
Won an internal hackathon by building a bidding interface for high-traffic search results using Chakra UI
Architected a rent-data normalization pipeline using internal data warehouses, SQS, and Python workers
Improved frontend deployment workflows via automatic preview branches with GitHub Actions
Shipped ingestion pipelines for cost-of-living data with FastAPI
Created CRON jobs to regularly refresh XML sitemaps for SEO improvement
Added caching logic to Express middleware for Fastly-equivalent internal infra

Meta (Marketplace) 								    	        Oct 2020 – Dec 2021
Junior Full-Stack Engineer									              Seattle, WA
Integrated a new design system into existing React applications using Chakra UI
Led frontend on-call handoff meetings to improve deployment and reliability practices
Migrated Node servers and React apps to a self-hosted Sentry instance
Implemented SSR-friendly features for pages receiving over 70k daily visits
Improved accessibility across multiple user flows

SKILLS 

Languages: TypeScript, Python, Bash  
Frameworks: React, Node, FastAPI, Django, Express, Redux, TanStack  
Platforms: AWS, Netlify, DigitalOcean, Render, Vercel  
Strengths: pixel-perfect implementation, API design, database schema design, frontend data management, relational databases, product & feature development, CRON jobs, data pipelines, SSR for SEO, scripting, communication, estimation

"""
In [682]:
prompt = f"""
You are a resume expert. Please tweak my resume to match the job_description and keywords

<job_description>
{doc.text.strip()}
</job_description>

<current_resume>
{current_resume}
</current_resume>

<keywords>
{keywords}
</keywords>

"""

Creating a more systematic analysis tool. Lets create a class that can both analyze resumes and job descriptions and compare.

In [683]:
import spacy
from spacy import displacy

class Extractor: 
    """Run text through NLP pipline and extract important data. Designed for job descriptions and resume text."""
    
    def __init__(self, text, patterns_path = "/Users/samueldowds/Desktop/patterns/patterns.jsonl", pipeline ="en_core_web_lg"):
        self.text = text
        self.skills = []
        self.clusters = []
        self.experiences = []
        self.job_titles = []
        self.salaries = []
        self.verbs = []

        # setup and run pipeline
        self.nlp = spacy.load(pipeline)
        ruler = self.nlp.add_pipe("entity_ruler", before="ner")
        ruler.from_disk(patterns_path)
        self.doc = self.nlp(text)

        # abstract
        self.abstract()
        
    def abstract(self):
        self.populate_skills()
        self.populate_verbs()
        self.populate_salaries()
        self.populate_skill_clusters()
        self.populate_experience()
        self.populate_job_titles()
        
    def populate_skills(self):
        skill_counts = Counter([ent.text for ent in self.doc.ents if ent.label_ == "SKILL"])
        self.skills = [skill for skill, count in skill_counts.most_common()]

    def populate_verbs(self):
        verb_counts = Counter([token.lemma_ for token in self.doc if token.pos_ == "VERB"])
        self.verbs = [verb for verb, count in verb_counts.most_common()]

    def populate_salaries(self):
        money_counts = Counter([ent.text for ent in self.doc.ents if ent.label_ == "MONEY"])
        self.salaries = [amount for amount, count in money_counts.most_common()]

    def populate_experience(self):
        experience_count = Counter([ent.text for ent in self.doc.ents if ent.label_ == "EXPERIENCE"])
        self.experiences = [title for title, count in experience_count.most_common()]
        
    def populate_job_titles(self):
        job_titles = Counter([ent.text for ent in self.doc.ents if ent.label_ == "JOB_TITLE"])
        self.job_titles = [title for title, count in job_titles.most_common()]
        
    def populate_skill_clusters(self, skill_label="SKILL", window=15):
        
        # cluster skills
        skills_positions = [(ent.text, ent.start) for ent in self.doc.ents if ent.label_ == skill_label]
        skills_positions.sort(key=lambda x: x[1])
        clusters = []
        current_cluster = []
        
        for i, (skill, pos) in enumerate(skills_positions):
            if not current_cluster:
                current_cluster.append((skill, pos))
            else:
                last_pos = current_cluster[-1][1]
                if pos - last_pos <= window:
                    current_cluster.append((skill, pos))
                else:
                    if len(current_cluster) > 1:
                        clusters.append([s for s, _ in current_cluster])
                    current_cluster = [(skill, pos)]
        
        if len(current_cluster) > 1:
            clusters.append([s for s, _ in current_cluster])

        # cluster verbs
        cluster_verbs = []
        for cluster in clusters:
            # Get token positions of skills in this cluster
            positions = [ent.start for ent in self.doc.ents if ent.text in cluster]
            verbs_nearby = set()
            for token in doc:
                if token.pos_ == "VERB":
                    # Check if verb is within window of any skill in cluster
                    if any(abs(token.i - pos) <= window for pos in positions):
                        verbs_nearby.add(token.lemma_)  # use lemma for normalization
            cluster_verbs.append({"cluster": cluster, "verbs": list(verbs_nearby)})

        self.clusters = cluster_verbs 

    def render(self):
        if self.doc:
            displacy.render(self.doc, style="ent")
        else:
            print("Please run process.")
In [684]:
job_data = Extractor(job_text)
In [685]:
my_resume = Extractor(current_resume)
In [686]:
class Compare:
    def __init__(self, resume, job):
        self.resume = resume
        self.job = job
        self.skills_missing = []
        self.skills_matching = []
        self.additional_skills = []
        self.skills_percent_match = None
        self.has_matching_title = False
        self.verbs_matching = []
        self.verbs_missing = []
        self.additional_verbs = []
        self.verbs_percent_match = []

        self.compare()

    def compare(self):
        self.compare_skills()
        self.compare_job_titles()
        self.compare_verbiage()
        self.summarize()

    def compare_skills(self):
        job_skills = set(self.job.skills)
        resume_skills = set(self.resume.skills)

        additional_skills = list(resume_skills - job_skills)
        skills_missing = list(job_skills - resume_skills)
        skills_matching = list(job_skills - set(skills_missing))

        self.skills_missing = skills_missing
        self.additional_skills = additional_skills
        self.skills_matching = skills_matching
        self.skills_percent_match = round(len(self.skills_matching) / len(self.job.skills), 2)
    
    def compare_verbiage(self):
        job_verbs = set(self.job.verbs)
        resume_verbs = set(self.resume.verbs)

        additional_verbs = list(resume_verbs - job_verbs)
        verbs_missing = list(job_verbs - resume_verbs)
        verbs_matching = list(job_verbs - set(verbs_missing))

        self.verbs_missing = verbs_missing
        self.additional_verbs = additional_verbs
        self.verbs_matching = verbs_matching
        self.verbs_percent_match = round(len(self.verbs_matching) / len(self.job.verbs), 2)

    def summarize(self):
        matched_text = "has" if self.has_matching_title else "has not"
        matching_skills_text = f"Matching skills: {', '.join(self.skills_matching)}." if self.skills_missing else "The candidate has no matching skills."
        verbiage_percent = f"Verbiage matched: {round(self.verbs_percent_match * 100)}%"
        
        match_summary = f"""
        Candidate matches {round(self.skills_percent_match * 100)}% of the required job skills and {matched_text} worked in this role before. {matching_skills_text} {verbiage_percent}
        """

        print(match_summary.strip())

    def compare_job_titles(self):
        if len([title for title in self.resume.job_titles if title in self.job.job_titles]):
            self.has_matching_title = True
In [687]:
comparison = Compare(my_resume, job_data)
Candidate matches 33% of the required job skills and has worked in this role before. Matching skills: TypeScript, React, server, design, AI, testing. Verbiage matched: 12%

Further ideas¶

  • Proximity (location)
  • Data recon on candidate (via email / name / external links)
  • Sentiment analysis
  • LLM summary for resume writer / candidate
  • Classes for parsing web page and resume
  • Overall match score
  • Pass Fail system
In [688]:
comparison.job.clusters
Out[688]:
[{'cluster': ['engineering', 'testing'],
  'verbs': ['deploy',
   'experiment',
   'interview',
   'build',
   'participate',
   'release',
   'include',
   'possess',
   'use',
   'impact',
   'have']},
 {'cluster': ['debugging', 'software'],
  'verbs': ['take', 'build', 'exist', 'debug', 'read']},
 {'cluster': ['business',
   'data design',
   'Node.js',
   'React',
   'TypeScript',
   'GraphQL',
   'Cloudflare',
   'server',
   'React'],
  'verbs': ['take',
   'meet',
   'fail',
   'understand',
   'communicate',
   'drive',
   'work',
   'render']}]
In [689]:
comparison.resume.clusters
Out[689]:
[{'cluster': ['TypeScript',
   'AI',
   'data analysis',
   'Google BigQuery',
   'server',
   'Python'],
  'verbs': ['take',
   'lead',
   'tailor',
   'build',
   'do',
   'sacrifice',
   'cite',
   'move']},
 {'cluster': ['Next.js',
   'AI',
   'pixel',
   'design',
   'component',
   'library',
   'react',
   'Tailwind CSS'],
  'verbs': ['establish',
   'relate',
   'cite',
   'join',
   'move',
   'lead',
   'visit',
   'pertain',
   'sacrifice',
   'design',
   'enable',
   'vary',
   'empower',
   'launch',
   'take',
   'build',
   'base',
   'scale',
   'bring']},
 {'cluster': ['database', 'Prisma', 'Amazon Web Services'],
  'verbs': ['establish',
   'design',
   'vary',
   'empower',
   'base',
   'believe',
   'visit',
   'optimize']},
 {'cluster': ['landing pages',
   'analytics',
   'testing',
   'Next.js',
   'Netlify',
   'serverless'],
  'verbs': ['take',
   'enable',
   'build',
   'design',
   '’re',
   'vary',
   'participate',
   'look',
   'sacrifice',
   'join',
   'move']},
 {'cluster': ['Python', 'deployment', 'GitHub'],
  'verbs': ['align', 'lead', 'tailor', 'build', 'do', 'pertain', 'cite']},
 {'cluster': ['XML', 'middleware', 'Fastly'],
  'verbs': ['range', 'determine', 'structure']},
 {'cluster': ['design', 'React', 'deployment', 'React', 'Sentry'],
  'verbs': ['align',
   'establish',
   'tailor',
   'vary',
   'relate',
   'participate',
   'base',
   'visit',
   'pertain',
   'scale',
   'sacrifice',
   'cite',
   'launch',
   'move',
   'bring']},
 {'cluster': ['Languages',
   'TypeScript',
   'Python',
   'Bash',
   'React',
   'Django',
   'Redux',
   'Netlify',
   'DigitalOcean',
   'pixel',
   'API',
   'design',
   'database',
   'design',
   'data management',
   'databases'],
  'verbs': ['establish',
   'do',
   'relate',
   'participate',
   'cite',
   'join',
   'move',
   'lead',
   'tailor',
   'visit',
   'include',
   'pertain',
   'sacrifice',
   'design',
   'enable',
   'vary',
   'empower',
   'believe',
   'launch',
   'take',
   'build',
   'base',
   'scale',
   'optimize',
   'bring']}]
In [690]:
comparison.skills_missing
Out[690]:
['Cloudflare',
 'Support',
 'debugging',
 'GraphQL',
 'C',
 'software',
 'support',
 'play',
 'business',
 'engineering',
 'data design',
 'Node.js']
In [691]:
comparison.verbs_missing
Out[691]:
['align',
 'ensure',
 'interview',
 'meet',
 'make',
 'localize',
 'relate',
 'participate',
 'debug',
 'note',
 'grow',
 'communicate',
 'require',
 'conduct',
 'move',
 'reinvent',
 'have',
 'amplify',
 'administer',
 'provide',
 'confirm',
 'deliver',
 'stay',
 'indicate',
 'come',
 'report',
 'protect',
 'perform',
 'last',
 'enable',
 'vary',
 'fail',
 'incorporate',
 'empower',
 'plan',
 'obtain',
 'believe',
 'accept',
 'flex',
 'unlock',
 '’',
 'extend',
 'emerge',
 'determine',
 'hold',
 'review',
 'put',
 'lower',
 'understand',
 'connect',
 'scale',
 'optimize',
 'play',
 'establish',
 'experiment',
 'cover',
 'do',
 'outline',
 'depend',
 'sponsor',
 'follow',
 'identify',
 'process',
 'cite',
 'join',
 'win',
 'get',
 'need',
 'apply',
 'imagine',
 'think',
 'tailor',
 'accelerate',
 'look',
 'visit',
 'include',
 'pertain',
 'work',
 'sacrifice',
 'render',
 'range',
 'pay',
 'birth',
 'want',
 'structure',
 'release',
 'demand',
 'distribute',
 'support',
 'lie',
 'allow',
 'represent',
 'impact',
 'inspire',
 'take',
 'help',
 'consider',
 'send',
 '’re',
 'mount',
 'thrive',
 'base',
 'possess',
 'recharge',
 'read',
 'bring']
In [692]:
comparison.job.text
Out[692]:
"Job Application for Senior Software Engineer, Webflow Cloud at Webflow\nSenior Software Engineer, Webflow Cloud\nCA Remote (BC & ON only); U.S. Remote\nAt Webflow, we’re building the world’s leading AI-native Digital Experience Platform, and we’re doing it as a remote-first company built on trust, transparency, and a whole lot of creativity. This work takes grit, because we move fast, without ever sacrificing craft or quality. Our mission is to bring development superpowers to everyone. From entrepreneurs launching their first idea to global enterprises scaling their digital presence, we empower teams to design, launch, and optimize for the web without barriers. We believe the future of the web, and work, is more open, more creative, and more equitable. And we’re here to build it together.\nWe’re looking for a\nSenior Software Engineer\nto join our\nWebflow Cloud team\nin our mission to enable Webflow customers to design and build powerful websites. In this role, you'll be a key member of our Webflow Cloud team, which is responsible for a new product that allows customers to deploy fully operational web apps, mounted on their Webflow-hosted domains. You’ll be working in Webflow’s Designer Dashboard to integrate the new features, and in our infrastructure that enables these features.\nAbout the role:\nLocation: Remote-first (United States; BC & ON, Canada)\nFull-time\nPermanent\nExempt\nThe cash compensation for this role is tailored to align with the cost of labor in different geographic markets. We've structured the base pay ranges for this role into zones for our geographic markets, and the specific base pay within the range will be determined by the candidate’s geographic location, job-related experience, knowledge, qualifications, and skills.\nUnited States\xa0 (all figures cited below are in USD and pertain to workers in the United States)\nZone A: $150,100 - $207,100\nZone B: $141,600 - $194,800\nZone C: $132,100 - 182,400\nCanada (figures cited below are in CAD and pertain to workers in ON & BC, Canada)\nZone A: $171,000 - $235,600\nThis role is also eligible to participate in Webflow's company-wide bonus program. Target amounts are a percentage of base salary and vary by career level. Payouts are based on company performance against established financial and operational goals.\nPlease visit our\nfor more information on which locations are included in each of our geographic pay zones. However, please confirm the zone for your specific location with your recruiter.\nReporting to the Engineering Manager\nAs a\nSenior Software Engineer, Webflow Cloud\n, you’ll …\nCollaborate with designers, PMs, and engineers to plan and build product capabilities that enable our ambitious visual development goals\nBuild, document, and test production code that impacts all Webflow customers\nParticipate in all engineering activities including technical designing and implementation, testing and validation, releasing new capabilities, incident response, and interviewing\nSolve problems in a highly technical platform that empowers hundreds of thousands of people\nTackle complex technical challenges on a collaborative and geographically distributed team\nIn addition to the responsibilities outlined above, at Webflow we will support you in identifying where your interests and development opportunities lie and we'll help you incorporate them into your role.\nAbout you:\nRequirements:\nBA/BS degree or equivalent experience\nYou’ll thrive as a\nSenior Software Engineer, Webflow Cloud\nif you:\nHave 5+ years of experience shipping features and products, with a focus on complex full-stack applications\nPossess an innate interest in web development tools, and their intersection with infrastructure engineering\nHave experience using cloud platforms to build and deploy web applications\nHave experimented with and shipped production-scaled code in several different frameworks\nLove thinking through large technical problems and working through that complexity on a collaborative, distributed team\nAre comfortable building up a mental model of a product and architecture through reading code and debugging existing software\nCan debug production issues across services and multiple levels of the stack\nTake pride in taking ownership and driving projects to business impact\nDeeply understand data design and modeling\nAre familiar with Node.js, React, TypeScript, Pulumi, GraphQL, Postgres, AWS, Cloudflare Workers for Platforms, and server-rendering complex React applications at scale\nHave consistently communicated trade-offs throughout a project to meet both technical and business requirements\nAre comfortable working in an agile, safe-to-fail environment\nStay curious and open to growth — actively building fluency in emerging technologies like AI to unlock creativity, accelerate progress, and amplify impact.\nOur Core Behaviors:\nBuild lasting customer trust.\nWe build trust by taking action that puts customer trust first.\nWin together.\nWe play to win, and we win as one team. Success at Webflow isn't a solo act.\nReinvent ourselves.\nWe don't just improve what exists, we imagine what's possible.\nDeliver with speed, quality, and craft.\nWe move fast because the moment demands it, and we do so without lowering the bar.\nBenefits\nOwnership in what you help build.\nEvery permanent Webflower receives equity (RSUs) in our growing, privately held company.\nHealth coverage that actually covers you.\nComprehensive medical, dental, and vision plans for full-time employees and their dependents, with Webflow covering most premiums.\nSupport for every stage of family life\n. 12 weeks of paid parental leave for all parents and 6+ weeks of additional paid leave for birthing parents. Plus inclusive care for family planning, menopause, and midlife transitions.\nTime off that’s actually off.\nFlexible vacation, paid holidays, and a sabbatical program to help you recharge and come back inspired.\nWellness for the whole you.\nAccess to mental health resources, therapy and coaching.\nInvest in your future.\nA 401(k) with 100% employer match (up to $6,000/year) in the U.S., and support for retirement savings globally.\nMonthly stipends that flex with your life.\nLocalized support for work and wellness expenses — from Wi-Fi to workouts.\nBonus for building together.\nAll full-time, permanent, non-commission employees are eligible for our annual WIN bonus program.\nTemporary employees may be eligible for paid holiday and time off, statutory leaves of absence, and company-sponsored medical benefits depending on their Fixed Term Contract and their country/state of employment.\nRemote, together\nAt Webflow, equality is a core tenet of our culture. We are an Equal Opportunity (EEO)/Veterans/Disabled Employer and are\nto building an inclusive global team that represents a variety of backgrounds, perspectives, beliefs, and experiences. Employment decisions are made on the basis of job-related criteria without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other classification protected by applicable law. Pursuant to the San Francisco Fair Chance Ordinance, Webflow will consider for employment qualified applicants with arrest and conviction records.\nStay connected\nNot ready to apply, but want to be part of the Webflow community? Consider following our story on our\n,\n,\n, and/or\n.\nPlease note:\nWe will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Upon interview scheduling, instructions for confidential accommodation requests will be administered.\nTo join Webflow, you'll need a valid right to work authorization depending on the country of employment.\nIf you are extended an offer, that offer may be contingent upon your successful completion of a background check, which will be conducted in accordance with applicable laws. We may obtain one or more background screening reports about you, solely for employment purposes.\nFor information about how Webflow processes your personal information, please review\n.\nCreate a Job Alert\nInterested in building your career at Webflow? Get future opportunities sent straight to your email.\nApply for this job\n*\nindicates a required field\nPhone\nResume/CV\n*\nAccepted file types: pdf, doc, docx, txt, rtf\nCover Letter\nAccepted file types: pdf, doc, docx, txt, rtf"
In [ ]: