
MyTherapist.ng - Online Therapy for Nigerians
Mytherapist.ng is a platform that connects individuals seeking mental health support with licensed and certified therapists.

A high-performance Apify Actor that bypasses Zillow's strict anti-scraping protections to extract rental market data at scale. Features map-based scraping, residential proxy integration, and automated data normalization.
In the competitive world of real estate technology (PropTech), data is the ultimate currency. Investors, property management companies, and market analysts rely on accurate, granular, and real-time rental listings to spot emerging trends, price properties competitively, and identify lucrative investment opportunities. Zillow, as the dominant marketplace in the US, holds the most comprehensive dataset of active listings. However, for developers and data scientists, accessing this data programmatically is a massive, often insurmountable challenge.
Zillow employs some of the most sophisticated anti-scraping technologies on the modern web. From aggressive IP banning and behavioral analysis to strict Captchas and complex TLS fingerprinting, their defenses are designed to stop automated data collection in its tracks. Most notably, they impose a hard cap on search results—showing only 500 listings per query regardless of the actual inventory size—which artificially fragments the data landscape.
This project, the Scalable Zillow Rental Data Scraper, was born out of a necessity to break through these barriers. My goal was to architect a robust tool that could reliably harvest rental data at scale, bypassing these technical limitations to provide a clear, uninterrupted view of the market.
Building a production-grade Zillow scraper isn't just about parsing HTML; it's a constant game of digital cat-and-mouse. During the initial research and development phase, I encountered two specific, critical roadblocks that defeated standard scraping methodologies:
Zillow's frontend is designed for human browsers, not data pipelines. No matter how broad your search is (e.g., "Rentals in Texas"), the API curtails the results list at 20 pages of roughly 25 listings each. This means that if a city has 2,000 active rentals, a standard scraper extracting data via the primary search endpoint will miss 75% of the available data. For a market analyst, 25% coverage is statistically useless. I needed a strategy to "unlock" the hidden 1,500 listings without making thousands of disconnected, blind requests.
The second hurdle was even more technical. Initial attempts to query Zillow's internal APIs using standard Python libraries like requests or aiohttp resulted in immediate 403 Forbidden errors. This wasn't a simple IP ban; it was TLS Fingerprinting.
Modern anti-bot systems like Akamai and Cloudflare analyze the TLS Client Hello packet sent during the initial handshake. Standard Python libraries have a distinct fingerprint (cipher suites, extensions order, etc.) that clearly identifies them as "automated scripts" rather than "human users using Chrome." If the cryptographic handshake didn't perfectly match a commercially available browser version, the connection was dropped before a single byte of application data was exchanged.
To solve these problems, I designed a cloud-native architecture using Python for its rich data processing ecosystem and Apify for its serverless orchestration capabilities. The solution relies on three core technical pillars:
curl_cffiTo overcome the access problem, I moved away from standard HTTP clients and implemented curl_cffi. This library is a Python binding for curl-impersonate, a specialized build of curl that can perform a TLS handshake identical to a real browser.
By configuring the scraper to identify strongly as Chrome 120 (or the latest stable version) at the networking layer, I effectively became invisible to Zillow's primary bot detection filters. The request headers, HTTP/2 pseudo-headers, and the TLS cipher suites were all aligned to perfectly mimic a legitimate user session.
# Hypothetical example of the networking logic
from curl_cffi import requests
def fetch_zillow_map_data(url, params):
# Impersonate a real Chrome browser to bypass TLS fingerprinting
response = requests.get(
url,
params=params,
impersonate="chrome120",
headers={...} # Standard browser headers
)
return response.json()
To solve the 500-limit issue, I reverse-engineered the behavior of Zillow's map view. I discovered that while the "List View" is heavily paginated, the "Map View" API endpoints are more flexible if queried correctly.
Instead of asking for "Rentals in Austin, TX" (a broad text search), the Actor was designed to accept a specific Geographic Bounding Box defined by North-East and South-West latitude/longitude coordinates. This approach allows users to slice a large city into smaller, custom grids. By targeting a specific 10-block radius or a single zip code, the result count for each individual query stays well under the 500 limit. This ensures 100% data capture coverage across an entire metro area by aggregating multiple granular "grid" scrapes.
Speed is critical when scraping real-time data. I structured the application using Python's asyncio to handle high-throughput I/O operations non-blockingly. The main actor script acts as an orchestrator that manages the lifecycle of the scrape:
pyzill_module) is decoupled from the actor logic (main.py), allowing for easier testing and maintenance.Getting the data is only half the battle; making it usable is the other. One specific breakthrough during development was dealing with the raw, messy JSON returned by Zillow's internal APIs.
Examples of the data chaos included:
I implemented a robust transformation layer (transform_listing_data) that intercepts the raw JSON before it reaches the final dataset.
def transform_listing_data(listing: dict) -> dict:
# Automatically hydrating relative URLs to absolute
detail_url = listing.get('detailUrl')
if detail_url and not detail_url.startswith('http'):
listing['detailUrl'] = ZILLOW_BASE_URL + detail_url
# Transforming complex photo objects into a simple array of high-res strings
# ... logic to parse carouselPhotosComposable ...
return listing
This step automatically converts relative paths into absolute, click-ready URLs and patches the photoData to provide a clean array of high-resolution image links. This attention to detail turned a raw scraping tool into a polished product that developers could integrate directly into their applications without needing their own post-processing scripts.
Since its deployment on the Apify Store, the Zillow Rental Data Scraper has achieved significant milestones:
curl_cffi and Residential Proxies has resulted in a stable, reliable actor that consistently bypasses Captchas.Software is never finished. To make this tool even more powerful, I am currently exploring several key enhancements:
Real estate data shouldn't be a black box locked behind walled gardens. Whether you are building a competitive analysis dashboard, training a pricing model, or just looking for your next apartment, this tool gives you the raw, unfiltered data you need.

Mytherapist.ng is a platform that connects individuals seeking mental health support with licensed and certified therapists.

DALC, LLC specializes in equal employment opportunity, diversity and inclusion, human resources, and business consulting.

Your No.1 Solution for hostel accommodation. Application for Nigerian students to easily search for hostel accommodation.