WHEN: 13th October 2020, 14:00h-16:00h
REGISTRATION: Register for free clicking here.
The Southwest constituency is glad to present the Python webinar, part III. In this third and final webinar we will take our Python knowledge one step further and will build a basic recommendation system for academic papers. Specifically, we will process the CORD-19 corpus, an increasingly growing collection of papers on COVID-19 made available by the Allen Institute for AI. We will apply basic Natural Language Processing (NLP) techniques to these papers so that, by the end of the session, we will be able to retrieve the most similar papers by comparing the content of their abstracts.
A prerequisite for this session is to be familiar with basic Python data structures (dictionaries, sets, lists, etc.), as well as handling files. This is covered in Sessions I and II of these same webinar series. You can find the links to the videos at the end of this web page.
TABLE OF CONTENTS
- The JSON format
- Modules and packages
- Lists and dictionary methods
- List comprehension
- Text processing with scikit-learn
This session will combine live coding with support material (text and images) for solidifying important concepts. There will also be time for hands-on exercises and a Q&A at the end.
AFTER THE WORKSHOP YOU WILL BE ABLE TO…
- Traverse a directory
- Handle libraries
- Transform text into numbers
- Compare (numerical representations of) documents
Level: These seminar series are intended as an introduction to Python from scratch. However, this session covers topics that are beginner-intermediate, and while you are not expected to have programming experience, it is recommended that you are familiar with the topics covered in the two previous SRUK SouthWest Python From Scratch webinars:
Logistics: The course will be delivered using Google Colab. You do not have to install anything in your machine. You only need a computer and a Google account.
INSTRUCTOR: Luis Espinosa-Anke, PhD www.luisespinosa.net