Published on March 13, 2014
gigHUB is a local gigs listing website.
Who are the beautiful people who created gigHUB?
@inkysplat Main Developer and DevOps engineer
@t_pk Frontend Developer & Creator of gigHUB
@albinjindu SEO “guru” and Social Media “wizard” * he hates being called a guru/wizard/rockstar/executive
@ripetungi Design & UX
Why do it? ● Because there is no-one comprehensive place to find a all the gigs going on in Bristol. ● Sites that do show gigs going on in Bristol are ticket sites which are more interested in conversion and sales to big venues like O2Academy. ● Bristol has an amazing music scene that isn’t sometimes celebrated to it’s fullest. ● Other sites that show gigs in Bristol are of a magazine format, so editorial content over powers the ability to find gigs.
So where do you begin making a comprehensive gig listings website?
We can scrape the venue web pages using the Simple HTML DOM Parser - http://simplehtmldom. sourceforge.net/ What we’re discovered
Web scraping?! Are you insane? YES
Let me explain... The Good It’s simple and effective. The DOM is an API!!! All the data is readily available publically. There are no API restrictions - rate limits or throttling. It means we can target specific venues and be more reactive to new venues coming to the scene.
Let me explain... The Bad Venues change their site design. Big maintenance overhead. Unnormalised data.
What about in production? We need it to be: Fault tolerant Audited Scalable Flexible enough to add more venues
We owe this man a pint... @taylorotwell
Who? The Laravel Bloke.
So why Laravel? Artisan.
So how does this all work?
We utilize Laravel’s Artisan command line utility for our cron job. We then use Laravel’s Eloquent ORM for creating our data entities: Artist > Lineup > Gig > Venue > City Our cron job runs every 6 hours and scrapes a few of the major venues web pages.
This doesn’t seem robust or maintainable? You’re right, it’s not!! We need a backup!
What we use Last.fm for: ● Artist’s data ○ Tags ○ Top Tracks ○ Biography ○ Photos ○ Who’s attending ● Metro data - aka city events ● Venue Data
Isn’t there more API’s out there? Yes But they have strict usage policies.
But we can use: Facebook’s OpenGraph API
How we use Facebook: Usage We fetch event data for each venue. Parse the event name for the bands playing.
How we use Facebook: Problem Event names are unnormalised. Less detail than webscraping. Not every venue is using Facebook’s event calendar.
Let’s recap... We’re using: Webscraping Last.fm & Facebook
Webscaping + Last.fm + Facebook = Data Normalisation Hell.
Normalisation :( Side Effects: ● duplicate events appearing from different sources. ● variations in band/venue names spellings causes more duplication. ● wrong artists found and matches in Last.fm. ● events discovered outside of Bristol.
Mitigation :) Some precautions when importing data: ● Create word lists to strip out of titles: ○ Plus Special Guests, Support Act ● Trimming based on key words: ○ Acme Presents, @ Acme 2014.03.12 ● Ensure same band isn’t playing twice on the same day. ● Ensure venues don’t have 2 gigs on same day - difficult to determine - Motion/In:Motion Thekla/Thekla TopDeck and festivals over long weekends.
API vs Webscraping Webscaping Pros We can capture prices of the gigs. We can capture sold out gigs. We can capture cancelled gigs. We can get ticket links and gig descriptions.
API vs Webscraping API Pros Accurate time and date for events. We can get event photos/pictures. Long term more reliable / less overhead. Social data… I’ll explain next...
Social Data? Tweets, Likes, Attending, Going, Listens, Tips, Check- ins, Digg’d, Reddit’d... * maybe not the last two
What can we do with it: ● Popular artists - based on listens. ● Hot artists - based on listens increases over time. ● Popular venues - based on checkins & likes. ● Popular gigs - based on who’s attending/going. ● Who went to the gig - based on checkins at venues while gigs were on. ● What people are saying about this venue - based on tips. ● Gigs based on your location - co-ordinates.
All very interesting but what’s next? We need to ensure we’re scalable first!
How we’ve scaled: ● Artist/Venue images are pre-processed and stored in the Rackspace CDN. ● Memcache used for caching database queries. ● Memcache used for caching rendered pages. ● Varnish used for caching static content . ● Pound used to route SSL connections through to Varnish. And there’s more we can do going forward...
The future is coming
Kickass Search You’ll be able to search: Venue name Artist name Gig name Time & Date Tag Artist description Venue description Gig description
Our own API To help us scale the backend better we will create an API. In turn this will: Open our data to the public. Allow us (me) to create a PhoneGap app (another pet Project of mine for later on). Stop @ripetungi moaning about running migrations everytime he wants to edit the SASS.
Why go to all these lengths?
So far: ● Average 1,000 hits a month. ○ 75:25 - New Visits:Returning. ● Currently for Bristol we have: ○ 186 venues - including the some duplications : ( ○ over 5,700 artists added since June 2013. ○ 679 gigs coming up in the future.
So far: ● Positive Feedback on Facebook & Twitter from local venues and music lovers. ● Boost our own portfolios - my github account is dyer and stackoverflow rating is zilch. ● Would love to roll this out to other nearby cities, if not nationally! ● Someone might give us lots of monies (please?).
Instead of this talk you could’ ve seen tonight: Charlee Draw (Thekla) Tailfeathers (Start the Bus) The Telescreen ft Frankie Cocozza (Fleece) A Wilhelm Scream (Exchange) Bombay Bicycle Club (O2 Academy)
Like us, Follow us, tell your friends and go and see some bands!! Thanks for listening. Any questions?
Composer, FPM & gigHUB Night March 2014 ... If you want to speak at PHPSW, drop Steve a line at steve at stevelacey dot net. Additional info
290 Ergebnisse zu Tom Adam: Bielefeld, Immobilien, University, Realty Inc, Davies, Immobilienmakler, Scottsdale, Phone, Photography