The Problem
Getting a construction permit should not require a law degree. Every city has different fee schedules, different required paperwork, different approval workflows. Contractors spend hours on the phone with city offices trying to figure out how much a permit will cost, what documents they need, and how long the process will take. The information exists, but it is scattered across PDF fee schedules, city websites, and phone-only offices that are open from 9 to 4 on weekdays.
We built this to centralize permit fee lookup across multiple cities, handle the messy reality of inconsistent data formats, and provide a clean interface for contractors and project managers to estimate costs and track applications.
The Approach
The system has three layers. The scraping layer handles OCR extraction from city fee schedule PDFs — these are the official documents cities publish, often as scanned PDFs that are not machine-readable. The scrapers extract fee tables, parse them into structured data, and validate the results against known constraints (a maximum fee should not be lower than a minimum fee, for instance). 10 named cities have dedicated scrapers, with 7 regional defaults covering areas without city-specific data.
The data layer uses a Proxy-based architecture for merging scraper results into the static fee database. database-loader.js wraps the static permit-fee-database.js and overlays any newer scraper results from scrape-history.json. This means the app always has baseline data even if a scraper fails, but benefits from fresher data when scrapes succeed. The pricing calculator imports from the database loader, never from the static database directly — this indirection is what makes the auto-update mechanism work.
The application layer is a vanilla JavaScript single-page app with an Express.js backend. 12 API endpoints handle fee lookups, requirements generation, user management, and permit application tracking. Role-based access controls separate contractors, reviewers, and administrators. The requirements generator produces structured markdown checklists of documents needed for each permit type — and notably, this is entirely static data, not GPT-generated. We removed the OpenAI dependency early because the output was unreliable for something as specific as municipal permit requirements.
Removing the AI dependency was the right call. Permit requirements are a domain where being wrong has real consequences — a contractor who shows up at city hall with the wrong documents loses a day. Static data verified against official sources is more trustworthy than LLM-generated checklists, even if it requires more manual maintenance.
Technical Decisions
OCR pipeline for PDF fee schedules. City fee schedules come in every format imaginable: properly tagged PDFs, scanned images, Excel files saved as PDF, and occasionally hand-typed HTML tables from the early 2000s. The scraper pipeline uses pdfjs-dist for extraction with fallback to OCR for scanned documents. Results go through validation before being merged — if a scraper returns a minimum fee higher than the maximum fee (which happens more often than you would think with OCR), the result is rejected and the previous known-good data is kept.
Proxy objects for database merging. This is a pattern we particularly like. The database loader creates JavaScript Proxy objects that intercept property access. When you ask for Chicago's electrical permit fees, the Proxy checks scrape-history.json first, falls back to the static database if no recent scrape exists, and handles the null-value edge cases that are common in fee structures (some cities use flat fees with no valuation rate, which JavaScript arithmetic treats as zero rather than NaN — a subtle but important distinction).
Static requirements data over LLM generation. The original plan was to use GPT-4 to generate permit requirements based on city and permit type. After testing, the output was plausible but not reliable enough for professional use. We replaced it with a static database of requirements compiled from actual city websites and verified by hand. It covers fewer edge cases, but every requirement it shows is accurate. The requirements-generator.js produces markdown from this static data.
1,015 tests in 3 seconds. The test suite covers 7 suites: pricing calculation, requirements generation, database loading, API endpoints, user management, and scraper validation. The scraper end-to-end tests are separated out because they depend on network access and pdfjs-dist compatibility, but the core suite runs in about 3 seconds — fast enough to run on every commit without friction. Jest handles everything, and the tests are comprehensive enough that we can change the database merging logic and immediately know if any downstream pricing calculations broke.
What We Learned
The biggest challenge was data quality. OCR is imperfect, city websites change without notice, and fee structures have edge cases that no schema can fully capture. Chicago's fee schedule has a minimum fee of $3,550 that is higher than its listed maximum fee of $2,400 for certain permit types — that is not a bug in our scraper, it is an actual inconsistency in the city's published data. The system needs to handle these gracefully rather than crashing or returning nonsensical results.
Region detection was another lesson in real-world messiness. Our initial implementation mapped ZIP codes and cities to regions for applying default fee schedules. It works for most of the continental US, but Alaska, Hawaii, and Oklahoma fall through to a generic default. Expanding coverage is a data entry problem, not a code problem — but it is a reminder that software covering geographic data is only as complete as the data it is built on.
This project reinforced something we keep learning: the hard part of full-stack applications is rarely the framework or the database. It is the domain logic — the fee calculations, the edge cases, the data validation. The 1,015 tests are not testing Express route handling (those are trivial). They are testing that the pricing calculator handles null valuation rates correctly, that the database merger resolves conflicts the right way, and that the requirements generator produces the right checklist for each city-permit combination. Domain correctness is where bugs cost real money.
Need a workflow automation app? Let's talk about your use case.