Website Scraper & Analyzer
A CLI tool that comprehensively analyzes and collects website information just by entering a URL
Created on: May 19, 2025

This content has been translated by AI from the original Japanese version.
I want to develop a CLI tool that can retrieve various information about a website in a list format just by entering a URL.
Main Features
Recursive Link Collection
- Retrieve links with the same origin from a page URL and call them recursively
- Ensures URLs with the same pathname are retrieved only once
SEO Information Analysis
- Retrieve SEO-related information for each page
- Title, description, keywords
- OGP information (og:title, og:description, og:image)
- Twitter Card information
- Heading structure analysis (h1-h6)
- Image alt attribute checking
Visual Analysis
- Take screenshots of pages
- PC version support (mobile version in the future)
- Full page capture
- Highlighting of important sections
Data Output
- Automatic sitemap.xml generation
- SEO analysis report output
- Link structure visualization
Technical Implementation
CLI Application
- Runs from the command line
- Utilizes Node.js environment
- Headless browsing using Puppeteer
- Output in various formats like JSON and CSV
Use Cases
- Competitor website analysis
- SEO audit
- Broken link checking
- Content inventory creation
- Automatic sitemap generation
Security Considerations
- Respect for robots.txt
- Recommend for analyzing own sites
- Preserve copyright notices
- Exclude privacy information
Architecture
Monorepo Structure
- CLI package
- Account management site (with API)
- Shared libraries
AI Features (Paid)
- Automatic content analysis
- SEO optimization suggestions
- Competitive analysis reports
- Available only when logged in via CLI
Future Expansion Plans
- Performance measurement features
- Accessibility checking
- Multi-site comparison analysis