datatalksclub

LLM Zoomcamp Week 4 - Monitoring and Evaluation Notes by waleed

August 6, 2024 | Reading Time: 18 min

datatalksclub
LLM
python

LLM Zoomcamp - Week 4 Notes In this section the focus is on the following: Extending the evaluation work we did in section 3 to monitor answer quality over time How to look at answer quality with user feedback and interaction How to store all this data and visualize it, etc. But before all that, let’s do a quick recap of where we are. Table of Contents Recap 4.1 Intro 4.


LLM Zoomcamp Week 3 - Vector Search with Elasticsearch Notes by waleed

July 19, 2024 | Reading Time: 55 min

datatalksclub
LLM
python

LLM Zoomcamp - Week 3 Notes In this section the focus is on the following: Applying Vector Databases as an alternative to Elasticsearch in the previous two models. Important to note that Elasticsearch has the ability to operate as a Vector DB as an alternative to Lucene and will be covered as well Vector embeddings and their role in building RAG applications Evaluation methods for search / query retrieval performance Table of Contents 3.


LLM Zoomcamp Week 2 - Open Source LLMs Notes by waleed

July 10, 2024 | Reading Time: 23 min

datatalksclub
LLM
python

LLM Zoomcamp - Week 2 Notes In the second week, we set up cloud-based GPU options like SaturnCloud and explore open source alternatives to OpenAI platforms and models like: Platforms: HuggingFace Ollama SaturnCloud Models: Google FLAN T5 Phi 3 Mini Mistral 7-B And finally, we put the RAG we built in week 1 into a Streamlit UI A few important call outs for this section: For the most part, I will be taking these notes in a Saturn Cloud notebook Which means that before starting each note section, I will be restarting the kernel to free up RAM from the GPU I’m using So if I ever decide to revisit these notes in the future, I won’t be able to just load this notebook and run things as is Table of Contents 2.


LLM Zoomcamp Week 1 - Intro Notes by waleed

June 29, 2024 | Reading Time: 23 min

datatalksclub
LLM
python

LLM Zoomcamp - Week 1 Notes This is the first week of the new LLM Zoomcamp hosted by DataTalksClub. I’ve found their content really helpful in the past, having completed the Data Engineering Zoomcamp a year ago and successively attempted (but never finished!) the other two zoomcamps they offer. In order to give me the best chance of completing this one, I’ve decided to publish my notes on my blog. Hope that helps me see this one through!


Prefect.io POC - Building ETL Pipeline for Toronto Bicycle Data by waleed

January 3, 2024 | Reading Time: 4 min

datatalksclub
prefect
python
sql
terraform
gcp
bigquery

Toronto Bicycle Data Engineering You can find all the code for this project here: https://github.com/waleedayoub/toronto-bicycle-data This was a project I explored as part of the final project of the datatalks club data engineering zoomcamp Project Description The goal of this project is to examine historical bike share ridership going as far back as 2016 in the city of Toronto, Ontario. The city of Toronto has an open data sharing mandate, and all bike share data can be found here: https://open.


Git set up for datatalksclub zoomcamp by waleed

September 5, 2023 | Reading Time: 2 min

datatalksclub
git

Requirements Original course repo as read-only that gets updated throughout the duration of the course My own repo in my github account that is used to get updates from original course repo and for me to write my own assignments / work Solution What I’m looking to do is essentially maintain a fork of the original course repository, while also adding your own work to it. Here’s a step-by-step guide to help you set this up: