DWH Data Model

When you are starting to build a data warehouse (DWH) you probably ask yourself: what data modeling approach should I choose? Should I go with Star schema or Snowflake schema? Snowflake schema seems like more natural approach – every data entity gets its own table. However if you have a lot of fact and dimension tables in your data warehouse, queries against snowflake schema can be cumbersome. Star schema or some kind of Snowflake/Star hybrid are more common and user friendly.

Star and Snowflake data modeling approaches were developed when companies mostly dealt with data like orders, purchases, shipments etc. This type of data obviously still flows into most of data warehouses. However these days many companies also collect data that reflects users’ activity on their websites: page views, clicks, searches etc. It’s not uncommon to have tens or even hundreds of millions of such events per day. Each event might have tens or even hundreds of attributes. Should we create a dimension for every attribute? Why don’t we just load all this data into flat fact table in my DWH?

With introduction of columnar databases like Vertica, Redshift, Greenplum etc, where every column stored in a separate file, querying tables with many-many columns became more efficient.  However storage utilization and query performance still can be an issue if tables are not designed properly. If you have hundreds of millions or billions of rows reflecting page views or clicks, it might be very costly to store actual useragent or url value in every row. For example: lets say you have 3B page visits stored in  your database and there  ~1M unique useragent that correspond to them. If you would store useragents in a dimension table – it will consume ~500MB of space. However, if you would store useragents in your fact table, attached to every page visit, it will require ~1.7TB of space. Also if you need to perform a search within a list of useragents or url – it’s more efficient to do it over the dimension table rather the fact table.

On the other hand, if your fact table contains  column like “payment_method” or “page_type” that hardly has more than 5-10 unique and fairly short values – creating dimension table might not be needed. Space saving wouldn’t be significant in this case. We usually just standardize value by trimming trailing spaces and bringing all characters to lowercase. Such design will also accommodate majority of reporting tool. For example,  tools like Microstrategy “like” to have dimension tables so list of unique values for certain data entity can be easily extracted. Columnar database usually will provide very fast response when you doing “select distinct …” from a column with   very few unique values, like list “payment_method” or “page_type” to create these dimension tables on the fly.

The Best Ideas Are All Around You

No one knows everything. It’s a fact of life. Being an overachiever, I used to struggle with this until one day I realized: this could actually be fun! That is why I decided years ago to always work with people who are smarter than me. That is also why I decided to work at ModCloth (other than the A-line dresses… and Einstein socks… and cat shirts… and…). I came to ModCloth to learn from other people while doing what I normally do as a Project Manager.

Then one day, something amazing happened: Hackathon.

A little over a year ago, I was presented with an opportunity to help a small team with a side project: ModCloth’s first Hackathon. I had read about such a thing before, but I never thought it was something that companies did in “real life” so I never gave it that much thought. With so many companies subscribing to the philosophy that profitability means finagling the lowest possible dollars per hour for everything, it is not often heard of (even in today’s increasingly entrepreneuristic society) to invest in the ideas of people.

I should rephrase that. I don’t mean people; I mean every person.

Every person you work with probably has an idea. An idea about how to make the office brighter in the middle of winter when you’re tired of the sun setting at 4:47pm. An idea about how to more easily collaborate with coworkers (including, but not limited to, chat clients that support cat gifs). An idea about how to save energy by turning off the lights when you leave conference rooms. It doesn’t matter what the idea is; the important thing is that it’s there and you should know what it is.

This brings me back to Hackathon. The small team that I worked with to make it happen did some research about what made Hackathons so great. We were trying to find the secret behind getting great ideas from people. We found out that it only takes three things: smart people, dedicated time, and brain food (which usually means any food). We knew we had the first one, and the second two were surprisingly easy to get.

What we ended up with was a 3-day event that allowed what we call our Experience team (Engineers, QA, Product Managers, Project Managers, User Experience Designers, User Researchers, Data Analysts) to work on a project of their choice using any skill they felt passionate about. Engineers who liked to design could create mocks. QA folks who knew how to code could create working prototypes. No one was constrained by their job title, roadmap, stakeholders, or anything else. Anyone could work with anyone and anyone could work on their great idea.

Since then, we have had 3 Hackathons far and have so much to show for it. We have a feature in our native iOS app called Fit for Me which allows users to find recommended products based on reviews from people with similar measurements. We also created a game featuring our loveable mascot, Winston, in 8-bit form saving prezzies from a castle in Winston the Prezzie-Loving Pug. We have some handy internal tools, and more features on their way to you as well.

So, what ideas do you have that you should share with others? What ideas do others have they should share with you? You never know what the smartest idea in the room is until you get everyone in the room. The best ideas are all around you.

Screen Shot 2014-02-06 at 11.15.02 AM

Kara F. is a Senior Project Manager at ModCloth

Travis Pro at ModCloth

We use Travis CI at ModCloth. More importantly, we use it for a significant number of our private GitHub repositories by way of Travis Pro.

One of the areas in which Travis has helped us a ton is the testing of a really big Rails application. The rest of this post will cover some basics about Travis in the context of our very own legacy “monorail app.”

Testing a really big Rails application on Travis

As for a really big Rails application, I should first mention a bit about ModCloth’s history in terms of Engineering. Once upon a time, not so many years ago, modcloth.com was built on the oscommerce framework. This worked for awhile until that platform was replaced by a greenfield Rails application. Fast forward a few years and that Rails application is still around, albeit with many new neighbors.

The test suite for this really big Rails application comprises rspec specs, cucumber features and jasmine tests. This is hardly a snappy feedback loop, and when we first started migrating to Travis Pro, it was dangerously close to the 50 minute cap in place at the time.

Thankfully, the team at Travis is eager to help and they make themselves makes very available, so we followed their advice and broke our rspec suite into multiple jobs in the build matrix. There are some great docs available here, but this is roughly what our .travis.yml looked like:

yaml language: ruby
# ... lots of stuff ...

    - RAILS_ENV=cucumber CI_CATEGORY=cucumber
    - RAILS_ENV=test CI_CATEGORY=model_specs
    - RAILS_ENV=test CI_CATEGORY=other_specs

# ... lots more stuff ....

For those not familiar with the build matrix, the result of this configuration is having three separate jobs run for any given GitHub payload sent to Travis Pro. Further opportunities to speed things up include the use of the parallel_tests gem and caching the results of bundle install --deployment in S3, although each of these methods also has tradeoffs.

In our current setup, we vendor all of our gem dependencies into vendor/cache and install like this on Travis:

language: ruby
# ... lots of stuff ...
install: bundle install --local --without development

This is an acceptable setup for us as the speedup is roughly equivalent to that given by Travis’ cache: bundler capability, which is explained more here.

The takeaway

Using Travis Pro has been a great experience for us at ModCloth. We can test all of our new shiny things and don’t have to maintain our own infrastructure for the legacy projects. The Travis team is highly available, super responsive, and they’re nice people!

Dan B. is a Senior Software Engineer at ModCloth

Disclaimer: We give money to Travis CI Gmbh, not the other way around. Any views expressed should be considered genuine. No thinly-veiled advertising here, folks.
At ModCloth, we believe in using the best tools (commercial and open source) to help us scale the business. If you’re interested in learning more about these opportunities, please send us a message at modcode@modcloth.com.

Xcode bots for pull requests

The iOS team at ModCloth is pleased to open source a tool which manages Xcode’s CI “bots” for pull requests on github.
Xcode CI
Our team was very excited when Apple announced Xcode Server and bots at WWDC earlier this year.  We wanted to take advantage of the integration with Xcode, and hoped that the configuration would be easier to manage than what we were using at the time: Jenkins.  While Xcode server is not perfect, we have found that for us, it has proven to be much easier to configure than Jenkins.  Also, both the web UI and the Xcode integration make it easier for us to identify and fix issues when they are found.
our bot server
Our workflow
We develop all new features on feature branches, and issue an internal pull request when we think it’s ready to merge in.  The pull request gives us a place to add simple documentation or notes about our feature, and creates an opportunity for code review.  Additionally, we test and accept our work on the branch – whoever accepts our stories (typically QA or our Product Manager) merges and closes the pull request.
The “catch”
There was one aspect of this workflow that Xcode server couldn’t handle.  With Jenkins we were building and testing on each pull request.  We couldn’t find a tool that allowed us to build and test each pull request automatically with Xcode Server — so we built one.
It’s on ruby gems so you can run “gem install github-xcode-bot-builder” to get it.  You can find the source code and more information on github here:  https://github.com/modcloth-labs/github-xcode-bot-builder
This is a command line tool, which can be scheduled and run periodically (e.g., from cron). It bridges the gap between Xcode Server and GitHub, and automates our process.  It creates bots when pull requests are open. After bots run, it communicates build status back to GitHub.  And it deletes bots when the pull request is closed.
Working with GitHub via their APIs was straightforward.  However, there isn’t a publicly documented API for Xcode Server.  (Hopefully Apple adds one in the future.)  So we had to get a little bit creative.  We reverse engineered the protocol that Xcode was using to communicate with the Xcode Server.  We found a json based interface that we were able to use.  We think this interface should be relatively stable – as long as the current versions of Xcode can work with Xcode Server, our tool should too.
Let us know what you think!
Thanks to Two Bit Labs for their help.
Geoff N. is a Tech Lead at ModCloth

Database Optimization for Fun and Profit

The Problem

During a ModCloth sale, our peak traffic can spike to be many times what it is during an average day!
These high load spikes can cause database contention, which limits the number of concurrent users we can serve with excellent response times. In order to keep our site scaling with the business, the platform group was called on to identify these problematic queries.


To understand what queries are running on a mysql database, you can enable a slow query log. You can tune what is defined as ‘slow’ by the global variable ‘long_query_time’. In our case, we wanted to have deep insight into what the database was doing during the sampling window, so we enabled logging for *all* queries.

mysql> set global log_slow_queries='ON';
mysql> set global slow_query_log_file='/tmp/queries.log';
mysql> set global long_query_time=0;
... wait
mysql> set global log_slow_queries='OFF';


Percona offers a free toolkit that is an excellent resource for those working with mysql databases. You can grab a copy here: http://www.percona.com/doc/percona-toolkit/2.2/ . One of the tools available in this kit is ‘pt-query-digest’.

pt-query-digest analyzes MySQL queries from slow, general, and binary log files. It can also analyze queries from SHOW PROCESSLIST and MySQL protocol data from tcpdump. By default, queries are grouped by fingerprint and reported in descending order of query time (i.e. the slowest queries first).

We can run this against the query log we captured with the steps above:

$ pt-query-digest /tmp/queries.log > queryreport.txt

The results can be quite verbose, but the crux of it looks something like this:

# Profile
# Rank Query ID Response time Calls R/Call Apdx V/M Item
# ==== ================== ============== ======= ====== ==== ===== =======
# 1 0x4819612E5A9FA29B 36.9210 6.3% 11124 0.0033 1.00 0.02 SELECT inventory_units
# 2 0x040BCDCC69AF844D 23.7635 4.1% 33584 0.0007 1.00 0.00 SELECT custom_values custom_fields
# 3 0x9C7801F0076868BE 23.1986 4.0% 87192 0.0003 1.00 0.00 SELECT productshots
# 4 0x868B8EB49826DAB2 17.5816 3.0% 25 0.7033 0.90 0.46 SELECT products productshots product_sales_pricings categories_products categories


Once these most expensive queries are identified, we can then optimize them, one at a time. This process includes making sure we are using the appropriate indexes, only selecting columns we need and altering database columns to use the correctly sized data types (int vs bigint, and so on).

Queries were improved during the normal release cycle, and once a release was pushed, we would compare the results in the query report. We watched as the work payed off, with some of the top queries vanishing from the list entirely, and the impact of others greatly reduced.

At ModCloth, we’re tackling bigger and bigger challenges as the business continues to scale. If you’re interested in learning more about these opportunities, please send us a message at modcode@modcloth.com.

Alex S. is a Software Engineer, Platform at ModCloth

The ModCloth Android app

The Birth of the ModCloth Android App

ModCloth has always tried to make it easy for our customers to find the “perfect” item. Part of doing this is allowing users to shop in the most convenient way possible, including the increasing use case of shopping from a mobile device. Over the past year, we’ve seen more and more of our customers shopping from their tablets and phones, and we now see about 50% of visits to ModCloth coming from mobile devices. To meet the demand for on-the-go shopping, we decided to make ModCloth on mobile the best experience possible.

In the past year, we made a commitment to be a truly “mobile first” company. This means a lot of things across our entire Experience organization; from User Experience Research to Design to Development. It includes “Responsive Design,” mobile web best practices and most definitely, native iOS and Android apps.



We launched our iOS universal app in early 2013, and quickly determined that we wanted to have an Android app available in time for the Holiday season. In late June, we kicked off our initial design sessions. Development started at the end of July and 13 weeks later we launched into the Play Store. And wow, did we learn a lot along the way.

Everything from “Should we support Gingerbread, and if so, how?” to “The proper way to NinePatch” to “Android Studio vs. Eclipse” to “Gradle” – and much much more (enough to fill several blog posts). But most importantly, we learned that our customers using Android devices had been eager to get a 1st class ModCloth shopping experience on Android.


Beta distribution, staged rollouts and rapid iteration
The Play Store is unique in many respects. Perhaps most notably for developers is the deployment process. Prior to launch, we leveraged the distribution features of the Play Store to provide the app to a few ModCloth customers who had expressed interest in early access to an Android app. These users provided great feedback and informed several decisions in the lead-up to our launch.

Once available to the public, we received a lot of feedback, and a few themes consistently came though. Users really wanted the ability to sort / filter products and to add items to their wishlists (especially timely as we approach the holiday shopping season). Thanks to feedback from our beta users, and responsive development and QA teams, we were happy to announce our v1.1 release supporting these features, just 2 weeks after our initial launch.

What’s next
We have plenty more updates in the works and we look forward to announcing some very cool features for Android, as well as sharing some specific things we learned along the way. Android visits on phone and tablet are accounting for an increasing portion of our customers’ shopping, and we plan to build the best experience possible for those customers.

Welcome to the ModCloth Experience

Here at ModCloth, we are all about the Experience. The Experience of our Girl when she visits our website or opens the box that carries her amazing new clothes. The Experience of our fulfillment employees who use the systems we build to guide their daily work. The Experience of our buyers who leverage our data to make smart purchasing decisions. And of course, the daily Experience of our product development teams who create all these other great Experiences.

At ModCloth, those product development teams make up The Experience team. We are Product Managers and Software Engineers. We are User Experience Researchers and Test Engineers. We are Visual Designers and Project Managers. And we all work together, collaborating to create the best products for our users.

We are organized into cross-functional Spheres, with each Sphere responsible for its own domain. From the Social Experience on our website to the Supply Chain that delivers the best styles to our girl, we form multidisciplinary teams that are independent and empowered. Each one of these Spheres operates like a mini-startup, empowered to do whatever it takes to delight our customers.

This blog will tell the stories of our teams and team members. These stories will cover our Spheres, as well as the individual innovators and craftspeople who excel at their domain. It will include the topics that keep them up at night. The philosophies that guide their thinking. The learnings that they have had. The ideas that excite them. The technologies, tools, and processes they use. And much more!

Enjoy reading it. We hope it will be as fun and useful for you as it will be for us.

Sarah Rose

Sarah Rose, SVP Product

Udi Nir

Udi Nir, CTO