This project is a web scraper built with Playwright and FastAPI. It opens a Chrome browser, scrapes all links from a given webpage, and displays them. The purpose of this project is to educate users on how to deploy a web-scraping service on Leapcell.
- Uses Playwright for browser automation
- FastAPI for building the web service
- Extracts and displays all links from a webpage
.
├── main.py # FastAPI application entry point
└── prepare_playwright_env.sh # Script to set up Playwright and dependencies
This script ensures all necessary dependencies are installed before running the scraper. It performs the following actions:
- Installs required Python packages:
fastapi
,uvicorn
,pytest-playwright
,python-multipart
, andjinja2
. - Installs Playwright along with Chromium and its dependencies for proper browser automation.
#!/bin/sh
set -e
# Install FastAPI, Uvicorn, Pytest, and Playwright
pip install fastapi uvicorn pytest-playwright python-multipart jinja2
# Install Playwright and its dependencies
playwright install --with-deps chromium
This guide walks you through setting up and deploying the project on Leapcell.
Ensure you have the following:
- A Leapcell account
- Python installed (recommended: Python 3.9+)
- Playwright dependencies installed
- Clone the repository:
git clone https://github.com/leapcell/playwright-crawler-py cd playwright-crawler-py
- Install dependencies and set up Playwright:
chmod +x prepare_playwright_env.sh ./prepare_playwright_env.sh
To start the FastAPI service, run:
uvicorn main:app --host 0.0.0.0 --port 8000
The application will be accessible at http://localhost:8000
.
- Push your code to a GitHub repository.
- Log in to Leapcell and connect your repository.
- Ensure
prepare_playwright_env.sh
is executed before running the service. - Deploy your application.
Once deployed, your application will be accessible via the Leapcell-generated domain.
Feel free to submit issues or pull requests to improve this project.
For support, reach out via the Leapcell Discord community or email [email protected].