Drag and drop an image from your favorite social media, or click to upload
Detection performance deteriorates with higher resolution photos. To minimize scope, this app was built and tested using lower resolution photos like those on LinkedIn.
--
--
--
--
Hello, I am a software engineer currently studying at the University of Alberta.
This is a machine learning project that I built to develop my skills in deploying, serving and scaling machine learning models.
There is zero caching of results in this application. Every inference request is ran fresh. What you see is the results being computed and delivered to you in real time.
Let's go over the flow of how requests are processed, and why I am using certain technologies.
These are proto stubs, they define the endpoints my services use to communicate with eachother.
1. Preprocessor
The preprocessor service rescales, and pads input images to the desired shape of 800x800. It then extracts the image of the face, and then uploads it into S3 so other model servers can use it. A byproduct of face extraction are the bounding boxes of the face, and coordinates for the eyes which are drawn on the frontend. This runs on an inf1.xlarge instance and is accelerated.
2. Embedder
The embedder model server pulls down the preprocessed image of the face, and transforms it into a 512 dimensional vector. This vector is used by the backend to query the vector database. This runs on an inf1.xlarge instance and is accelerated.
3. Analyzer
The 'analyzers' are a group of 4 models that I group together because they all do similar things. The age, gender, race and emotion models are all grouped here. They run on CPU instances because they are smaller and don't need as much compute to respond to requests in a reasonable amount of time.
The distributed architecture of this app was built with scaling in mind. Because each service in the pipeline is (relatively) isolated, I am able to set up autoscaling node groups in EKS which will spin up a new backends, analyzers, or preprocessors whenever there is continuous load applied to my services . My Kubernetes deployments and services will then adjust to these nodegroups, create replicas and route requests to them when they are available. (I don't do this right now because I don't want bezos to take all of my tuition money)