WebSite Organization and Search Engine Implementation
- Element: Models either directories or webpages.
- WebSite: Represents a website and provides methods for managing its structure.
WebSite(host)
: Creates a new WebSite object for saving the website hosted at host
.
getHomePage()
: Returns the home page of the website.
getSiteString()
: Returns a string showing the structure of the website.
insertPage(url, content)
: Saves and returns a new page of the website.
getSiteFromPage(page)
: Given a page, returns the WebSite object it belongs to.
__hasDir(ndir, cdir)
: Checks if a directory exists in the current directory.
__newDir(ndir, cdir)
: Creates a new directory if it doesn't exist.
__hasPage(npag, cdir)
: Checks if a webpage exists in the current directory.
__newPage(npag, cdir)
: Creates a new webpage if it doesn't exist.
__isDir(elem)
: Checks if an element is a directory.
__isPage(elem)
: Checks if an element is a webpage.
- InvertedIndex: Represents the core data structure of the search engine.
InvertedIndex()
: Creates a new empty InvertedIndex.
addWord(keyword)
: Adds a keyword to the InvertedIndex.
addPage(page)
: Processes a webpage and updates the inverted index.
getList(keyword)
: Retrieves the occurrence list for a given keyword.
SearchEngine(namedir)
: Initializes the SearchEngine with a directory containing webpage files.
search(keyword, k)
: Searches for the top k web pages with the maximum occurrences of the keyword.
- Constant time complexity for various operations.
- Linear time complexity for generating site structure.
- Logarithmic time complexity for directory and page existence checks.
- Linear time complexity for adding keywords and retrieving occurrence lists.
- The implementation aims to optimize efficiency for website organization and search queries.
- A test dataset is provided for evaluating the correctness and performance of the code.
About
Midterm homework of the Design and analysis of algorithm course
Languages
Language:Python 100.0%