Download Files from the Web Using Scala

March 18, 2020
Recently I have been practicing Functional Programming with Scala and for practice, thought I would do a small project where I would automagically download the Covid-19 reports from the World Health Organization. A bit about Scala Scala (scalable language) is a statically typed, functional, object-oriented language that also lets you write imperative code if need be. It compiles to the Java Virtual Machine and is optimized for big data workloads, such that the distributed big data processing technology Apache Spark, is written in Scala. What all of that means is that you can use existing java libraries (and the many stack overflow answers!), and write elegant, succinct code to get the job done. The sucky part is that you’ll have to learn functional programming. Though, trust me, you won’t regret it. Onwards! Step 0: Dependencies and Imports For the purpose of the tutorial, I started a new project on IntelliJ using SBT (Scala Build Tools). I’ve imported the JSoup library as a dependency in my build.sbt file in my IntelliJ project. We’ll be using JSoup as the library for parsing HTML from the WHO website: Below are the imports that we’ll be using throughout our project. Personally, my knowledge with the scala standard library is quite limited. Hence, I won’t be going into detail at this point. Step 1: Get Links from a Webpage First, let’s write a function that allows us to scrape Links off a Web-page called getLinks. Get links will take two parameters, url, and selector. Step 2: Clean our URLs In the World Health Organization website link structure, the hrefs drop the root url and use relative urls. To handle this, we need to add logic to append the root url to each of our links Step 3: Download Files Lastly, we’ll write our downloadFiles function which will download and write files to a specified path. Step 4: Lets Put it all together Step 5: Extract Data from PDF files using Python I haven’t solved this problem with Scala, I have previously written on how you can extract data from a PDF file using Python. You can refer to this article to give you a general idea on how you can go about extracting that data.

Related Posts

UI/UX Design Trends for 2023

UI/UX Design Trends for 2023

Optimizing user experience is EVERYTHING! Facilitating the user flow and providing intuitive navigation ensures a positive user journey. Here's what you need to look out for to not only make that first impression memorable but to also sustain it. BOLD COLORS The 90s...

Ready to turn insights into action? Let our tech experts bring your vision to life. Hire us today.