Scala code for downloading files from the web using JSoup

Download Files from the Web Using Scala

Recently I have been practicing Functional Programming with Scala and for practice, thought I would do a small project where I would automagically download the Covid-19 reports from the World Health Organization. A bit about Scala Scala (scalable language) is a statically typed, functional, object-o

Rizwan Qaiser·March 18, 2020·2 min read
Recently I have been practicing Functional Programming with Scala and for practice, thought I would do a small project where I would automagically download the Covid-19 reports from the World Health Organization. A bit about Scala Scala (scalable language) is a statically typed, functional, object-oriented language that also lets you write imperative code if need be. It compiles to the Java Virtual Machine and is optimized for big data workloads, such that the distributed big data processing technology Apache Spark, is written in Scala. What all of that means is that you can use existing java libraries (and the many stack overflow answers!), and write elegant, succinct code to get the job done. The sucky part is that you’ll have to learn functional programming. Though, trust me, you won’t regret it. Onwards! Step 0: Dependencies and Imports For the purpose of the tutorial, I started a new project on IntelliJ using SBT (Scala Build Tools). I’ve imported the JSoup library as a dependency in my build.sbt file in my IntelliJ project. We’ll be using JSoup as the library for parsing HTML from the WHO website: Below are the imports that we’ll be using throughout our project. Personally, my knowledge with the scala standard library is quite limited. Hence, I won’t be going into detail at this point. Step 1: Get Links from a Webpage First, let’s write a function that allows us to scrape Links off a Web-page called getLinks. Get links will take two parameters, url, and selector. Step 2: Clean our URLs In the World Health Organization website link structure, the hrefs drop the root url and use relative urls. To handle this, we need to add logic to append the root url to each of our links Step 3: Download Files Lastly, we’ll write our downloadFiles function which will download and write files to a specified path. Step 4: Lets Put it all together Step 5: Extract Data from PDF files using Python I haven’t solved this problem with Scala, I have previously written on how you can extract data from a PDF file using Python. You can refer to this article to give you a general idea on how you can go about extracting that data.

From the intelligence suite

Is your attribution stack lying to you?

AttributionCheck maps every gap in your data layer — free, in minutes. Find out which conversions you're missing.

Run a free check
Continue reading
Conceptual 3D illustration comparing fragile browser cookie data falling into a black hole versus a secure server-side tracking architecture, illustrating the cause of shrinking Facebook retargeting audiences.

Marketing Infrastructure

Signal Loss in Facebook Ads: Why Your Retargeting Audiences Are Shrinking

You know the feeling. Spending $10,000 on top-of-funnel traffic. Driving thousands of qualified visitors to the site. The engagement looks good. The Add to Carts are firing. You think, “Excellent. Now I’ll just scoop them up with a retargeting campaign and print money.” But when building the Faceboo

Eisha FaisalApr 7, 2026
6 min read
Gemini said An isometric digital illustration showing server-side tagging transforming fragmented browser data into clear ROAS analytics and performance charts.

Marketing Infrastructure

Server-Side Tagging Architecture: Fix Data Loss and Reclaim Your ROAS

If your tracking lives in the browser, you do not control it. Browsers block pixels, iOS drops signals, and ad blockers kill scripts before they even load. Then, your team sits in a meeting staring at three different revenue numbers, wondering which one is a lie. This is why most Meta dashboards loo

Eisha FaisalApr 2, 2026
5 min read
Conceptual illustration of a scalable marketing data infrastructure and first-party data lake designed for agency operations and data hygiene.

Marketing Infrastructure

How to Design Scalable Marketing Data Infrastructure

You hire the best media buyers. You A/B test creatives until 2 AM. You obsess over the offer. But while your team is fighting for a 0.5% lift in click through rates, your infrastructure is likely leaking 15% of your total traffic before the page even finishes loading. Most agencies treat data tracki

5 min read