PDF Analyzer

Objective & Background:

Full-stack application that allows user to upload PDFs and have them digitally analyzed. Used Python Flask for API implementation. Used MongoDB for backend storage of documents and their respective information.

APIs:

Built seperate APIs to upload a PDF File, obtain sentiment of file, incorporate file-scraping features, and utilized OpenAI ChatGPT API.

Upload:

- upload a PDF and checks the correct input file type using Pytest unit tests

- saving to Mongo DB happens in here as well

- uses allowed_file fucntion to check file format

Extract Text:

- scrapes the PDF and extracts all containing text and returns it

Get Summary:

- utilizes OpenAI API to prompt the model to summarize the text of the respective document

- CREDIT CARD must be added to OpenAI account to make it functional - OpenAI bug

- cuts down summary to three sentances - easily changed

Get Sentiment:

- Uses TextBlob Python library to optain sentiment from text

Document List:

- displays all uploaded documents

Document View:

- renders page that shows link to NLP and the text that was extracted

NLP:

- renders page that will show sentiment analysis, GPT summary, and also creates a feature where you can search how many times a certain word appears in the respective text

Sentiment Def:

- renders a page that shows the TextBlob website defentition for the sentiment value given that gives user insight on what the meaning of the analysis is

Mongo DB:

- Allows for more flexibility with sentiment analysis section by potentially allowing user to select different portions of text that they want to analyze

- Due to the fact that the main entity of our project is a document, it would be very difficult to store this information in a structured table based format such as SQL

- Wanted a highly flexible format that could easily be changed becuase new analysis tools are constantly being developed and added that we want to store in a database