Data Visualization Portfolio

Climate, Sustainability, & Policy Analysis

This portfolio showcases work from LSE's PP434 Automated Data Visualization course, taught by Professor Richard Davies. Through ten weekly coding challenges, I developed progressive expertise of data visualization techniques, from basic chart embedding to advanced analytics and machine learning, deliberately focusing all work on climate and sustainability themes.

Week 1: Hosting

Task: Setting up Github pages and embedding charts using vegaEmbed.
Source: Richard Davies Library

Week 2: Building

Task: Creating custom charts using the Economics Observatory Data Hub and embedding them.

Indonesia produces 2.2× more renewable energy than Thailand, despite both showing growth over the period.

Source: Economics Observatory

Week 3: Debate

Task: Producing two charts that support or refute a policy debate topic.

Policy topic: Media emphasizes China, India, the US and the EU as top emitters. Does per-capita analysis change which countries bear primary responsibility for addressing climate change?

Finding: Per capita data exposes Qatar, the US, and Australia as top emitters, not China or India, revealing how aggregate metrics distort climate accountability narratives.

Source: Our World in Data

Week 4: Replication

Task: Finding a chart that a policy organization, journalist, think tank, television channel or company has used, replicating it, and then improving on it.

Original chart: Carbon Brief produced this visualization tracking gender representation at COP climate conferences. The yellow-purple contrast distinguishes female and male delegates, with select meetings emphasized. Though the progression toward parity is evident, the visualization could be improved.

Original Carbon Brief Chart

Replication: My version retains the original style, including the yellow-purple palette, highlighted COP meetings, and vertical bar structure. The underlying dataset was manually constructed by tracing values from Carbon Brief's published image, as raw data wasn't accessible.

Key improvements: I normalized gender percentages to a stacked 100% scale, added a 50% parity line to show progress, and revised the title to emphasize balance improvement.

Source: Carbon Brief

Week 5: Accessing data: Scraper and API

Task (API): Creating a chart using live API integration.

This visualization showcases live API integration using Open-Meteo's Historical Weather Archive. The chart references the API endpoint directly in the Vega-lite specification, eliminating manual data downloads.

API structure:
• Base endpoint: https://archive-api.open-meteo.com/v1/archive
• Location: latitude=14.6042&longitude=120.9822
• Time period: start_date=2014-01-01&end_date=2024-12-31
• Weather variable: daily=temperature_2m_max
• Timezone: timezone=Asia/Manila
• Complete URL: https://archive-api.open-meteo.com/v1/archive?latitude=14.6042&longitude=120.9822&start_date=2014-01-01&end_date=2024-12-31&daily=temperature_2m_max&timezone=Asia/Manila

This approach enables dynamic data loading. The visualization fetches updated JSON data each time the page loads, maintaining currency without manual updates.

Source: Open-Meteo

Task (Web Scraping): Scraping a website, cleaning and normalizing the data, and exporting it into TIDY format.

I scraped Wikipedia's renewable energy table using BeautifulSoup and pandas, filtered for six Southeast Asian countries, and reshaped to long format for visualization.

Source: Wikipedia

Week 6: Loops

Task: Using a loop to batch download different series as JSON files and another loop to embed multiple charts.

I batch downloaded temperature data across Southeast Asian cities using a Python loop to call Open-Meteo's API. I then used a JavaScript loop to embed all charts, demonstrating scalable visualization workflows for multi-location climate data.

Source: Open-Meteo

Week 7: Maps

Task: Creating coordinate and choropleth maps of Scotland and Wales.

Wales Choropleth: I mapped PM2.5 air pollution across Welsh local authorities using a choropleth, revealing urban centers like Cardiff face highest concentrations while rural areas remain cleanest.

Source: StatsWales

Scotland Coordinate Map: I mapped Scotland's operational renewable facilities using coordinate data, with interactive filtering by technology type revealing wind power's geographic dominance across the country.

Source: UK Government Open Data

Week 8: Big Data

Task: Producing two charts from a UK supermarket price dataset, simplifying millions of observations for visualization.

I analyzed UK protein prices from the Long Run Prices Database (1988-2022), selecting per-kilogram meat items and tinned beans for fair comparison. After normalizing to price per 100g, I aggregated quarterly averages and calculated meat-to-bean price ratios. Beans cost five times less than beef, supporting economic accessibility of sustainable protein.

Source: Long Run Prices Database
Source: Long Run Prices Database

Week 9: Interactivity

Task: Producing two charts that include more advanced interactivity (sliders, dropdowns, clickable legends, etc.).

I implemented a year slider to visualize how marine protection evolved geographically across the Coral Triangle from 1977-2019, showing spatial patterns in conservation progress.

Source: World Database on Protected Areas

I added a year slider and hover interaction to highlight individual country trajectories against the global average, enabling temporal comparison of protection performance.

Source: World Database on Protected Areas

Week 10: Advanced Analytics & Machine Learning

Task (Advanced Analytics): Producing a chart using more advanced analytics like regression, shock analysis, or heat maps.

I applied linear regression to each country's marine protection trajectory, projecting to 2030 with 95% confidence intervals. Charts reveal significant gaps between current progress and international targets, quantifying required acceleration.

Source: World Database on Protected Areas
Task (Machine Learning): Conduct an applied data analysis using machine learning (supervised or unsupervised), and visualizing it.

Hypothesis: Marine protected areas (MPAs) will cluster by national governance approach, with countries choosing either larger, multi-use areas with low no-take zones or smaller, strictly protected reserves.

Method: I applied K-means clustering to Coral Triangle MPAs using five standardized features: log area, age, protection intensity, IUCN strictness, and management plans. Elbow method identified four optimal clusters.

Finding: Four management groups emerged based on size and strictness, not geography. Malaysia uniquely showed a consistent national strategy with medium, high-intensity MPAs.

Source: World Database on Protected Areas

Data Sources & Code

Data Sources

Code & Analysis


AI Disclosure: I used Claude AI as a coding assistant with data processing and visualization code development.