python

LLM Zoomcamp Week 4 - Monitoring and Evaluation Notes by waleed

August 6, 2024 | Reading Time: 18 min

datatalksclub
LLM
python

LLM Zoomcamp - Week 4 Notes In this section the focus is on the following: Extending the evaluation work we did in section 3 to monitor answer quality over time How to look at answer quality with user feedback and interaction How to store all this data and visualize it, etc. But before all that, let’s do a quick recap of where we are. Table of Contents Recap 4.1 Intro 4.


LLM Zoomcamp Week 3 - Vector Search with Elasticsearch Notes by waleed

July 19, 2024 | Reading Time: 55 min

datatalksclub
LLM
python

LLM Zoomcamp - Week 3 Notes In this section the focus is on the following: Applying Vector Databases as an alternative to Elasticsearch in the previous two models. Important to note that Elasticsearch has the ability to operate as a Vector DB as an alternative to Lucene and will be covered as well Vector embeddings and their role in building RAG applications Evaluation methods for search / query retrieval performance Table of Contents 3.


LLM Zoomcamp Week 2 - Open Source LLMs Notes by waleed

July 10, 2024 | Reading Time: 23 min

datatalksclub
LLM
python

LLM Zoomcamp - Week 2 Notes In the second week, we set up cloud-based GPU options like SaturnCloud and explore open source alternatives to OpenAI platforms and models like: Platforms: HuggingFace Ollama SaturnCloud Models: Google FLAN T5 Phi 3 Mini Mistral 7-B And finally, we put the RAG we built in week 1 into a Streamlit UI A few important call outs for this section: For the most part, I will be taking these notes in a Saturn Cloud notebook Which means that before starting each note section, I will be restarting the kernel to free up RAM from the GPU I’m using So if I ever decide to revisit these notes in the future, I won’t be able to just load this notebook and run things as is Table of Contents 2.


Managing Dev Environments - Local vs Codespaces by waleed

July 4, 2024 | Reading Time: 4 min

python
github
codespaces
secrets

Managing Development Environments: Local Machine vs. GitHub Codespaces Since I don’t really do development on a regular basis, especially not for my day-to-day at work, I find that one of the most annoying parts of picking up projects is getting environments set up, making sure libraries are up-to-date if needed and managing any environment variables and secrets to 3rd party services. So this post really is me capturing the patterns I’ve used and find the most useful so I can refer to them in the future (instead of fumbling through half-written notes in Obsidian).


LLM Zoomcamp Week 1 - Intro Notes by waleed

June 29, 2024 | Reading Time: 23 min

datatalksclub
LLM
python

LLM Zoomcamp - Week 1 Notes This is the first week of the new LLM Zoomcamp hosted by DataTalksClub. I’ve found their content really helpful in the past, having completed the Data Engineering Zoomcamp a year ago and successively attempted (but never finished!) the other two zoomcamps they offer. In order to give me the best chance of completing this one, I’ve decided to publish my notes on my blog. Hope that helps me see this one through!


Prefect.io POC - Building ETL Pipeline for Toronto Bicycle Data by waleed

January 3, 2024 | Reading Time: 4 min

datatalksclub
prefect
python
sql
terraform
gcp
bigquery

Toronto Bicycle Data Engineering You can find all the code for this project here: https://github.com/waleedayoub/toronto-bicycle-data This was a project I explored as part of the final project of the datatalks club data engineering zoomcamp Project Description The goal of this project is to examine historical bike share ridership going as far back as 2016 in the city of Toronto, Ontario. The city of Toronto has an open data sharing mandate, and all bike share data can be found here: https://open.