A few times per year ZoomInfo holds TechFest, a two day period for the engineering team to collaborate on mini projects with few constraints. During this time the engineers are encouraged to have fun and experiment. In fact, the only rule is there are no good or bad ideas. Continue reading
Zoominfo’s TechFest is a two day time-out from day-to-day project work where we allow ourselves to think outside the box and brainstorm improvements to ZoomInfo offerings.
As a technology- driven company we of course rely on innovation for all aspects of our product offerings. There is a consistent need for us to both improve our established products and processes as well as to look into the future in order to take our technology to the next level. Undeniably, the driving force of ZoomInfo is our technology and the innovations and improvements we develop now will be what propels us into the future.
Our engineers have responsibility to not only complete their projects on time, but to be constantly focused on what they can do to improve and achieve the ultimate philosophical goal of our development; making the unknown known. Continue reading
CQL3 (Cassandra Query Language) is an API to interact with Cassandra, that has syntactical similarities to the commonly used SQL. CQL3 was introduced in Cassandra 1.1 as beta, but became final in Cassandra 1.2. Prior to CQL3, the typical API used to interact with Cassandra was Thrift. Datastax addresses some of the motivation around introducing CQL3 as an alternative to Thrift:
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Hadoop framework allows for distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Since its initial release in late 2007, Hadoop has become the leading way to do Data Mining and Distributed Computing. The project enjoys support from major backers such as Yahoo! and Cloudera and a very broad adoption rate by both large and small companies. Right now, there are over 4100 Hadoop-related jobs posted on Indeed. That’s 3x the number of Django jobs listed, and 5x more than the Node.js framework.
Here at Zoom, we employ Hadoop for a wide variety of data processing and data mining tasks. As one example, we’ve got 12 years of crawler data archived, comprising some 50 TB of information. That’s a fantastically rich corpus primed for data mining. This is what I’d refer to as a “traditional” use of Hadoop. That is, we store a massive amount of data in HDFS, and then run MapReduce jobs against it, looking for interesting information. This use case is right in Hadoop’s sweet spot – if you bring the computation to where the data lives, you can achieve massive parallelism without worrying about things like network latency and network throughput.
But not all of our uses of Hadoop are so traditional. Given the variety of different data collection & data processing tasks Zoom performs, not all of them lend themselves to a MapReduce model. For example, some of them query databases or Solr servers. Some make RESTful API requests to Google. Some run IMAP commands. Some crawl websites. But, in our opinion anyway, many of these use cases till lend themselves well to the Hadoop framework. What we generally end up doing is defining a work queue (eg: a crawl schedule) in HDFS, and store the results back into HDFS for use by other jobs.
Zoom isn’t in the business of building platforms, and you probably shouldn’t be either. It’s usually a much better use of resources to focus on your core competencies and do the things that make your company the best widget maker on the planet. Ready-made platforms generally reduce development costs and shrink time to market. And with today’s robust Open Source ecosystem, there are few reasons not to use off the shelf platforms like Hadoop. If you need a platform that’s:
- Horizontally and vertically scalable (ideally, with process isolation)
- Highly available
- Complete with a simple reporting framework
- Complete with a simple management/administration framework
And you need it done quickly & cheaply, Hadoop is definitely worth checking out.
Here at ZoomInfo we are always referencing our automated system of gathering and assembling public business information available on the World Wide Web. Currently ZoomInfo holds six patents behind that allow our technology to function and gather the most up-to-date business information possible. The patents vary from our integration of a crawler with Natural Language Programming to the technology that automatically identifies sites and pages according to their function. Click here to read more about the patented technology that drives ZoomInfo.
As Senior Software Engineer Roger Alix-Gaudreau described in a previous post, ZoomInfo uses Agile development processes to grow and improve our products. We do this because we can develop software rapidly and in short sprints of usually 2-4 weeks, allowing us to quickly adjust to the growing and changing needs of our clients. Due to the fact that we also work in cross-functional teams from different departments and often have employees involved in development projects who are new to the Agile process, here’s a simple overview about Agile and how ZoomInfo implements it!
Continuous development activities
- Managing and grooming the backlog. Dedicated ZoomInfo product managers act as resident experts and are well versed in what functionalities need to be fixed, updated, or revised. To that end, product managers enter all work to be done on our products into the “backlog” as “user stories.” Written from the perspective of respective stakeholders, user stories describe the needs of end users and read like, “As a user of ZoomInfo Pro I want to be able to sort companies by annual revenue so that I can target larger companies. ” Or “As a marketing manager, I need change the title tags on PR pages to optimize them for search engines.” Product managers are also responsible for keeping the backlog manageable, prioritized, and up to date.
- Planning poker. Every few weeks the stakeholders – who include software developers, marketing managers and client service reps – meet for “planning poker” where we discuss the items in the backlog and do high-level estimations of the effort needed to complete them. Each “player” be given a deck of cards with points on each card. Easy tasks are given 3 to 5 points; items that are super complicated get upwards of 100 points! Setting points like this allows the product manager to come up with a prioritized iteration, a group of stories that can be completed in a set time period. From here the schedule is set!
Time for a sprint
All of this up-front work allows us to be ready for the sprint, a two to four week time period of rapid development. The chain of events is as follows:
- The product manager goes through the backlog and chooses high priority and high ROI user stories that match the capacity of the development team. Usually the product manager has any copy or designs needed for each task already in place as well.
- Next we workshop the included user stories. They are tasked out to our developers and given time estimates so that that we can track the progress of the sprint.
- Daily we hold “stand-ups” where we each answer three questions:
What did I do yesterday?
What am I doing today?
What is blocking my progress?
Identifying the roadblocks is incredibly important to keep the sprint moving along.
- As user stories are completed and accepted, our project management software shows us a nice visualization of our velocity, how quickly we are completing user stories and if we are staying on track.
And that’s it!
Oh–who is telling IT it’s time to deploy
The reward? Sending out an email to the company about what has been accomplished in the last month of development. Nothing better.