Introduction: Defining the Problem
In today’s digital world, misinformation and fake news spread faster than ever. To tackle this problem, I decided to build a machine learning model that can analyze the credibility of text-based content. I named this project “CheckMate” — it includes a deep learning model built from scratch to detect fake news and exposes the model via an API.
Project Goal and Scope
The core goal of CheckMate is to classify a given text input as “real” or “fake” with high accuracy. The main objectives were:
- Find a reliable dataset and prepare it for model training.
- Train a task-specific text classification model from scratch instead of using off-the-shelf models.
- Turn the trained model into a publicly accessible REST API service.
- Deploy the service in a scalable and portable setup using Docker and Hugging Face.
Technical Details: Model Development
Dataset and Preprocessing
For training, I used the Kaggle Fake and Real News Dataset. I applied standard preprocessing steps like tokenization, stop-words cleaning, and removing punctuation.
Training the Model from Scratch
The project uses a scikit-learn Random Forest architecture. Training and data processing leveraged scikit-learn, pandas, and NumPy libraries. On the test set, it achieved high accuracy.
REST API Creation
The trained model was turned into a REST API using FastAPI. The API accepts text input and returns a prediction in JSON format.
Deployment (Release)
All dependencies were packaged into a Docker container. The Docker image was deployed to Hugging Face Spaces to make the project accessible live.
Challenges and Learnings
Through this project, I learned the nuances of training a deep learning model from scratch, the fundamentals of API development, and how to containerize a model with Docker and publish it on cloud platforms like Hugging Face.