Houston Open Challenge Winner: Council Records Scraper

Jeff Reichman on in Bits

This is the second post in a three part series profiling the winners of the recent City of Houston Open Challenge. You can read the first post here.

Rakshak Talwar and Micah Thomas are both Computer Science majors at the University of Houston. I’ve gotten to know them over time through various activities in the Houston startup community and RedLabs, UH’s startup accelerator.

Over the course of a few days, they scraped PDFs of City Council minutes, organized the data, and made it searchable.

Houston Open Challenge Winner City Council Records Scraper

Team: Parabyte Data
demogithub

Jeff: Can you guys tell me a little bit, each of you, about your backgrounds?

Rakshak: My name is Rakshak and I’m 21 right now. I’m a computer engineering student at University of Houston. I like working with data, and I like working with embedded hardware– my favorite being robotics. I’m connecting the two and focusing more on artificial intelligence now.

Micah: I’m Micah. I’m also a computer engineering student at U of H, trying to decide between computer science, probably going to be computer science, but I did mostly did robotics and stuff in high school. That’s how I got into programming and computer science. Building websites and backend servers. That’s what I do.

Jeff: Why did you guys choose this challenge?

Rakshak: We chose this challenge because there’s a lot of data on spending, and in the future we could draw certain interesting trends or patterns from it. Also, visualizing spending makes it easier for everyone in the city, including the citizens and politicians, to track spending and better allocate it towards to what actually matters. So you can objectively quantify it where it should go and where it is going, and see the difference between the two and adjust from it.

Jeff: What are some of the hurdles that you faced developing this prototype?

Micah: Yeah, so the hardest part was getting the data off the NOVUS agenda site. It was all hardcoded PDFs and really obfuscated with an HTML and javascript interface that had ASP.net. So there was no easy way to crawl the data. We ended up having to use a headless browser and then manually click links, and then manually get the data off of that site. The hardest part was building that crawler to get the data off the site.

Jeff: So what are some of the things you would like to do to improve the application going forward?

Rakshak: Right now we’re targeting procurement information, but we would also like to expand to ordinances, and also have a better idea who is going to be using it. So, like, who is a primary user, and what are they looking for and what information is important to them.

 

 

Related Posts:


About the Author

Jeff Reichman

Jeff is passionate about using data to make better decisions and reveal new insights. He founded January Advisors and Sketch City, and serves on the board of the League of Women Voters of the Houston Area. Read his full bio on LinkedIn.