Lubimyczytac Scraper to Goodreads Pipeline
A Selenium-based data pipeline that reads a public Lubimyczytac profile library, enriches book records with per-book metadata, and exports a Goodreads-compatible CSV.
Scope
- Source: public Lubimyczytac profile library pages
- Output: normalized local CSV files, including Goodreads import format
Stack
- Python 3
- Selenium + ChromeDriver
- uv (environment and dependency management)
Project Layout
.
|-- scraper/
| |-- profile_scraper.py # phase 1: list scraping from profile pages
| |-- enrichment.py # phase 2: per-book enrichment orchestration
| |-- book_details.py # phase 2: ISBN/original title extraction
| `-- __init__.py
|-- models/
| |-- book.py # Book dataclass and CSV schema
| `-- __init__.py
|-- data_io/
| |-- csv_utils.py # CSV read/write and Goodreads export mapping
| `-- __init__.py
|-- dane/
| |-- books.csv # phase 1 output
| |-- books_enriched.csv # phase 2 output
| `-- goodreads.csv # phase 3 output
|-- tests/
|-- main.py # pipeline entry point
|-- config.example.ini
|-- pyproject.toml
`-- LICENSE
Setup (uv)
- Install
uv(if missing):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
- Sync dependencies:
uv sync
- Ensure Google Chrome and a compatible ChromeDriver are available in
PATH.
Configuration & Running
Create your local config.ini from the config.example.ini:
Then edit config.ini:
[settings]
profile_url = https://lubimyczytac.pl/profil/YOUR_PROFILE_ID/YOUR_PROFILE_NAME
Run the pipeline entry point:
uv run python main.py
Phase Artifacts Summary
dane/books.csv: raw list scrape from profile pages (phase 1)dane/books_enriched.csv: per-book ISBN and original title enrichment (phase 2)dane/goodreads.csv: Goodreads import-ready export (phase 3)
Pipeline Phases
Phase 1: Profile Scraping
- Module:
scraper/profile_scraper.py - Entry function:
scrape_books(profile_url) - Input:
profile_urlfromconfig.ini(expanded inmain.pywith list query parameters)- Processing:
- Opens profile library pages in Selenium
- Iterates pagination
- Extracts row-level metadata (title, author, ratings, shelves, link, etc.)
- Produces
Bookobjects (domain model) before CSV serialization - Output file:
dane/books.csvviasave_books_to_csv(...)- Shelf fields in this phase:
Na półkach Główne: primary state shelf (e.g.Przeczytane,Teraz czytam,Chcę przeczytać)Na półkach Pozostałe: custom user shelves/tags (optional)
Phase 2: Record Enrichment
- Modules:
scraper/enrichment.py,scraper/book_details.py - Entry function:
fill_isbn_and_original_titles(books) - Input file:
dane/books.csvloaded byload_books_from_csv(...)- Processing:
- Visits each book URL from column
Link - Extracts ISBN and original title from the book detail page
- Fills missing original title fallback with the Polish title
- Output file:
dane/books_enriched.csvviasave_books_to_csv(...)
Phase 3: Goodreads Conversion
- Module:
data_io/csv_utils.py - Entry function:
convert_books_to_goodreads(input_file, output_file) - Input file:
dane/books_enriched.csv- Processing:
- Maps Lubimyczytac columns to Goodreads import schema
Na półkach Główne-> GoodreadsShelvesNa półkach Pozostałe-> GoodreadsBookshelves- Writes Goodreads-required headers and transformed rows
- Output file:
dane/goodreads.csv
Educational Purpose
This project is intended for educational use only. It is designed to demonstrate web scraping workflow design, CSV data processing, and multi-phase data transformation in Python.
License
This project is licensed under the MIT License. See LICENSE for details.