05) Data Scraping – [PRINT/JOURNALISM] News

Since news scraping was one of the first tasks I started, I needed to carefully consider both where the data would come from and how it would be visualized. My initial attempt was to script within Toon Boom Harmony, the software I was using for the animation. I thought it would be efficient to handle everything within a single platform. However, there was another issue: I had a clear vision of a 2D animated project, but Blender only supports scripting in 3D environments.

After some attempts to get a working script in Toon Boom Harmony, I ran into difficulties. The UI was more complex than I had anticipated, and I struggled to figure it out. With the project on a tight deadline, I didn’t want to invest time into learning a new software, so I decided to pivot to something I was more comfortable with.

I decided to continue using Blender for this project, mainly due to my familiarity with it. I had previously used Blender’s scripting functionality in an earlier project, so it made sense to build on that experience. With the help of online tutorials—and quite a bit of trial and error—I managed to install the necessary packages and set up the environment for data scraping. This allowed the collected data to be read and processed using Blender’s built-in Python scripting module. A particularly helpful YouTube tutorial I followed for this setup was this one.

At the same time, I wanted to ensure that the final animation maintained a 2D aesthetic. To test this, I imported an MP4 file into the scene, placed it beneath some dummy-generated data, and positioned the camera directly above the plane. This successfully created the flat, 2D look I was aiming for. Based on this result, I decided to place all data visualizations on a single flat plane, with animations imported onto the same plane underneath, and keep the overhead camera setup to preserve the 2D effect.

I then did some scraping tests with more static pages where the HTML would be easier to scrape, and the content was not dynamically loaded, just to check if Blender would be able to read the downloaded libraries, like BeautifulSoup and Selenium. I was able to successfully scrape and visualise the data on a 3D plane in Blender.

I concluded that this approach would be a viable method for both scraping and visualizing data, as it effectively combined the animation and data functionalities into a single workflow. With the technical setup in place, I began scraping the data I needed for the project and started experimenting with different ways to visualize it—focusing on making the output timely, dynamic, and visually engaging.

I began with headlines from the BBC homepage. While the initial scraping worked well, integrating live update functionality caused the software to crash repeatedly, highlighting the need for a more stable way to handle real-time data within Blender.

To troubleshoot further, I ran the code in isolated blocks using a Jupyter Notebook to identify anything that might be causing Blender to struggle. After fixing a few issues and confirming that the code worked as expected in VSCode, I integrated it back into Blender. However, when I ran it again, I encountered the same freezing issue, which suggested the problem wasn’t with the code itself, but rather with how Blender was handling it internally.

I realized the freezing was likely caused by running long operations directly on Blender’s main thread. Since Blender can’t perform any other tasks until the current operation completes—and in this case, that operation was continuous—it effectively locked up the entire application.

Eventually, I managed to get the live update functionality working alongside some basic visualisation. I used Selenium to scrape live BBC headlines every minute, and then updated a 3D text object in the Blender scene with the new data. To prevent Blender from freezing, I ran the update process asynchronously, allowing the headlines to refresh every 60 seconds without interrupting the animation or user interaction.

I then ran into an issue: although the data was refreshing, Blender would freeze every 60 seconds during the update. This clearly wasn’t viable, especially since the animation needed to loop smoothly without interruptions. To resolve this, I shifted to using sockets more deliberately—handling the dynamic scraping of data entirely outside Blender, while letting Blender focus solely on visualizing the incoming data and running the animation.

I coded the scraper in Visual Studio Code, and set up the client side in Blender to simply receive the data and render it visually. This separation worked much better—it ran smoothly, avoided crashes, and allowed for fast, dynamic updates without interrupting the animation flow.

I ended up learning how sockets function—they enable real-time, two-way communication between a client (Blender) and a server (VSCode) by establishing a continuous TCP connection. This removes the need for repeated HTTP requests and responses, which I suspect was contributing to Blender crashing earlier. With sockets, data transfer and interaction became far more efficient.

On the left, you can see the two processes finally communicating and successfully outputting text into Blender. To confirm that the data was still updating dynamically, I left a camera recording for a minute—and it was! The headlines were indeed changing in real time.

I also attempted to create a newspaper object onto which the data could be projected, with the intention of modeling a dynamically updating newspaper within the scene. However, this significantly slowed down the data loading process. It wasn’t viable—I couldn’t afford to have an empty frame in the animation while waiting for the newspaper to load. So, I chose a simpler and more stable solution: letting the text appear independently and building the animation around it, rather than binding it to a specific object.

The next thing I did was set the text colour to black. At first, I was confused because the change didn’t seem to take effect—but it turned out to be a simple issue with lighting. Once I checked the Shading tab, I realised the scene lighting was affecting how the colours appeared.

I also tried to add line spacing and implement a scroll function. However, this didn’t work as expected, so I decided to leave it for later. At this stage, it was more important that the data behaved the way I wanted; I could always focus on the aesthetics once the functionality was solid.

I then simplified the code to make it more manageable and added logic to scrape data every five minutes. This approach was suggested by our supervisor, Ken, as a way to imply that the data is live without needing to stream headlines in literal real time, which often caused software and blocking issues. By collecting data at slightly longer intervals, we allow Blender the flexibility to visualize and interpret the headlines in a more intentional and stylized manner.

All the scraped headlines are sent to Blender, which then displays a few at a time, cycling through them in batches until all have been shown. This method keeps the visual experience dynamic and engaging—displaying everything at once would not only overwhelm the viewer, but also reduce the impact of individual headlines.

Next, I went through the data and realized I needed to filter out irrelevant content to ensure that only meaningful information would appear in the visualization. Since the data would be presented as part of a newspaper, elements like “Video, 00:01:26”, “LIVE”, or “Watch Now” were not useful in this context. Additionally, there were glitches in how each batch of data was appearing and disappearing from the screen.