DevOps Popular Post Kat_pw Posted December 19, 2017 DevOps Popular Post Report Posted December 19, 2017 The past 48 hours have been fun and at times completely stressful, but overall I think it has been a great success. However, that being said.. I would like to apologize to our players and to 10gbps.io for some miscalculations on my part that resulted in quite slow downloads between 09:00 UTC and 16:30 UTC. To players: you don’t deserve slow download speeds for a special event such as wintermod. To 10gbps.io: it wasn’t fair for us to have boasted your services as being able to handle the mod downloads when we forecasted much lower loads. Your server fought well in our losing battle. I want to clear up a few things and make sure that everyone knows: 1. The story to how this happened in the first place. 2. What measures we (attempted) to take during the slow speeds and what we learned from it. 3. How truly badass the server was that 10gbps.io provided to us to use for distributing wintermod. So, let’s look into things and get nerdy for a bit. Buckle up. It’s gonna be a ride. To start on our journey, planning for wintermod beings in October / November for me. I must look around for capable services (servers) for the CDN (Content Delivery Network). In 2015 we used a fleet of ~20-30 servers @ 100mbps each. A cool 2-3 gbps (or so we thought) of bandwidth was at our fingertips. While this worked, we soon found out that some of the servers would be capped at 24mbps… and just randomly go offline. No good. We struggled through for the next few days but eventually(:tm:) everyone was able to download the mod. One of the important lessons we learned from that year was.. SPLIT UP THE FILES! We wasted an ENORMOUS amount of bandwidth from players getting frustrated from the ~600MB filesize. Players would start the download, get to some number like 75%.. Cancel and try later. This meant we sent much more data out than we needed to. In 2016 we had some CDN creation under our belt, knew we had to split up the wintermod and had a brand new launcher that would help facilitate downloading and installing the mod for players. Win win right? No, not so fast. While we did have more bandwidth that year, if memory serves, around 15 servers @ 800mbps. Or a cool 12gbps. Nice! What went wrong? Checksums. We split the files up into smaller chunks, and it turns out when you’re seeding data to all of the 15 CDN nodes, some of those files can be a tad bit corrupt. Great. We ended up needing to write some seeding scripts to ensure the CDN servers got all of the files players needed for the wintermod, and that the md5 checksum matched the master CDN with what they downloaded. After a few tense hours of wrestling with the servers, we had a nice little fleet. The release went “ok” but we made a mistake.. We launched during high load, kicked off all 5,000 players online. We saw an enormous peak of usage, maxing out the 12gbps we had due to .. well. 5,000 people trying to download all at once. Oops. So, 2017 comes along. We got this, right? Well, sorta. The provider from 2016 has been experiencing issues with servers being out of stock. We had the option of ordering 20+ servers when they showed up in stock over the course of october-now and pay an incredible rate for what would be idle servers, find another solution elsewhere (with very limited options), or not release wintermod at all. The latter wanted to be avoided at all costs. While comparing services, I reached out to 10gbps.io letting them know what our mod is about and to see if they’d be able to help us at all with a possible sponsorship deal. To my surprise they agreed and provided us a 10gbps capable server. This was incredible. They quite literally saved -christmas- wintermod! Once the wintermod was updated to support ETS2 1.30, our plan was to release during a low player population time so the late night truckers who were on wouldn’t be stuck with long download times and we could fix any issues that would come up without disrupting too many. After all, we DID learn from last year, right?! Right. Let’s figure we’re seeing a peak of 10,000 players; looking at stats.truckersmp.com we can see that on a given day, the online number of players goes up by ~1,000 per hour. One would think this would mean we should expect ~1,000 downloads per hour. Wintermod is 700MB or.. ~700GB per hour. After some napkin math that means with a 10gbps connection to the internet at worst, 1,000 players downloading the wintermod should equate to a download time of 10 minutes if everyone is downloading at their max speed and the server is sending at full 10gbps. Awesome! This is where the fun begins. 01:42 UTC. Let’s release! Only 2900 truckers were online. The download server was tuned up with all engines running. MWL4 was sleepy and I was overly excited. We started to watch the download server and ensured that players were getting the mod correctly.. And damn fast might I add! Between 1:42 and 2:08, if you downloaded the mod, you would not be allowed into a server due to version mismatch. This was planned. 02:08 UTC. Servers are restarted. All ~1700 truckers online are kicked off. As you can see on the graph, we BARELY touched 9gbps which slowly decreased and started our true downward slope at 2:40. Awesome! 05:33 UTC. Time for bed and a tweet. Things are looking good, traffic is steady and low, well within ranges. 05:34 UTC. This is where things get hilarious. Remember that low traffic I just tweeted about? Yeah. It turns out 05:34 UTC is the exact time when virtual truckers start to wake up. Let’s look at a graph of 05:30 -> 09:00. Yikes… what’s next? Oh yeah… truckers cap the poor server at its max 10gbps… FOR HOURS. 13:53 UTC. Time to wake up. Always in a good mood early in the morning. :) Yep. Solid Hours of 10gbps. Something isn’t right here.. Yeah, that number was supposed to be like.. under 10,000. As you can see from my frantic typing, it wasn’t. SO~! How do we fix this? ADD MORE SERVERS! Right?! Yeah. Let’s throw 10 servers at the problem 800mbps each. Let’s fire up csshx and get to work. 14:52 UTC. Servers ready to go. Throw em’ in rotation. A few minutes pass.. DNS propagates. We’re truckin’ now! ~17gbps! Yeee buddy. Take that you silly bandwidth problem. 15.24 UTC. I’ve saved the day, time to go home. Wait… So.. did the CDN get everyone caught up? And now the load issue is resolved? That’s odd.. Why isn't our 10gbps server being utilized anymore? This is either really good, or really bad. Hey, that scrollbar on the side of your browser still goes down, this blogpost can’t end here, right? Right. 15:30 UTC. The Dawning. I just added 10 800mbps servers into a Round Robin cluster with a secondary static A record pointing to our 10gbps server. Why doesn’t.. Oh.. right.. We’re now serving our beefy 10gbps server 1/10th of the traffic now.. Heh.. oops. (Seriously, looking back now I feel absolutely foolish I didn’t foresee this, but no better way to learn/realize new things than trial by fire!) Now, at this point, by reasoning of deduction. We can “safely” assume that our traffic pull EASILY could have been in the range of 20-30gbps since we just 1/10th our load and our server is STILL pushing out ~ 3gbps. Yikes. Our Maximum now is below what the 10gbps server was doing.. Let’s undo this silly idea. 15:32 UTC. DNS changes take effect again, our 10gbps server is now taking most of the new requests, the CDN servers that were in rotation are finishing up the downloads going to the users who connected to them. 15:45 UTC. Stress at all time high. Options about what could work to prevent this in the future discussed. TL;DR, Have some fancy HAproxy load balanced with servers with a 302 redirect on the server taking the brunt of the requests to the fleet of weighted CDN servers (always send requests to the lowest utilized server.) But that’ll have to be after testing and research. Maybe next year. 15:46 UTC. After reading this and laughing, I realized It’s time to step away from the computer. This is a computer game where you drive trucks, I should not be letting this get me so worked up. I will let things even out on the CDN fiasco, get some food, try to relax and most importantly clear my mind. 15:49 UTC. While pouring water into my oatmeal, it strikes me. I can do the 302 trick, but it doesn’t have to be ‘high tech’ with weighted rotation. Let’s just let cloudflare do that under a different subdomain. It can’t be that easy, can it? Let’s try to offload the core_ets2mp.dll file to a single CDN server... 16:01 UTC. Eat and research. 16:05 UTC. WHAT THE HELL? 16:05 UTC. KAT YOU IDIOT! Some CDN server snuck back into the DNS rotation. GET OUTTA HERE! 16:06 UTC. Traffic beings recovering. 16:09 UTC. BEHOLD! THE POWER TO SAVE THE SNOW! (To anyone requesting the file /files/data/core_ets2mp.dll, tell em’ to go down the road to http://downloads.ets2mp.com and find the file there.) 16:20 UTC. Reload nginx with the new config… Mother of god.. 404mbps for one 13MB dll file.. 16:?? Add a few other files to offset the load to our fleet. We’ve recovered, Sometime within here around 16:40 our load took a downward trend. We’re going to the peak of player numbers (~19:00 UTC) I think we can call this done. Disable the 302’s, let the 10gbps server do its work; shut down the 10 CDN servers. And there we have it. Lessons have been learned. My takeaway is this: 1. There will be things that you try to plan for that will not go as expected. This is ok. The important thing is to try to remain calm as best as possible and work towards a solution. 2. Take a break once in awhile. Clearing your head to do something as silly sounding as pouring water to make oatmeal can do beneficial things. 3. Don’t be an idiot and decide to load balance uneven servers when sleepy and stressed out. 4. Simple solutions can do wonders. I again want to thank 10gbps.io for providing the server for us that did it’s part so very well. Even though our peak load seems to have been 3x what it could handle. I also want to thank you, our players for holding out with us as we wrestled with balancing what ended up being ~7x the load than we were expecting for a 24 hour period. As of writing this, we have served out around 108,000 wintermod downloads. That’s ~82TB of data in 48 hours. Not too bad for a free mod! If you’d REALLY like to see what the download server can do now that it’s not being abused with more than we planned, press F1 with the launcher open to clear out your local files and try to download it again! :) View post on homepage 17 8 1 1 Traffic Cameras : twitch.tv/kat_pw | Ets2Map : ets2map.comKat_pw Status:[CCTV #1] Status:[CCTV #2] Status:[CCTV #3] Status:
DevOps Kat_pw Posted December 19, 2017 Author DevOps Report Posted December 19, 2017 Formatting looks better on the blog post here : https://truckersmp.com/blog/70 <3 5 Traffic Cameras : twitch.tv/kat_pw | Ets2Map : ets2map.comKat_pw Status:[CCTV #1] Status:[CCTV #2] Status:[CCTV #3] Status:
JeffSFC Posted December 19, 2017 Report Posted December 19, 2017 (edited) I love the detailed and technical write-up. Great work as always Kat, and everyone else involved. This place wouldn't be in the state its in without the hard work and dedication from you all. Edited December 19, 2017 by JeffSFC 2 Intel i7 7700k│ASUS ROG Maximus IX Hero│G.SKILL TridentZ RGB 16GB 3000MHz │EVGA GTX 1080 FTW│EVGA SuperNOVA 850W P2│Samsung 970 EVO 1TB NVMe M.2/Samsung 850 EVO 500GB│Corsair H100i v2│Corsair iCUE 465x RGB Audio Technica AD700's│Beyerdymanic DT770's 80Ω│Corsair Strafe RGB Blues│Logitech G502 Spectrum│Alienware AW3420DW
MrHarv98 Posted December 19, 2017 Report Posted December 19, 2017 Thanks for the detailed writeup kat, I mean who doesn't love to see what the servers go through to keep truckers running all the time. 1
doorgapmonsterTTV Posted December 19, 2017 Report Posted December 19, 2017 seems like evrybody wanted it at once geez
Gun Powder Posted December 19, 2017 Report Posted December 19, 2017 Your work is awesome, thanks for everything.
Carrera18 Posted December 19, 2017 Report Posted December 19, 2017 I have not had a chance to use it yet. after a few days my computer will be ready. I'm looking forward to. You are awesome by your work.
RainbowDragon Posted December 19, 2017 Report Posted December 19, 2017 What an awesome post, didn't realize there would be THAT many downloads. However while on topic, I'm still downloading the Wintermod at a rate of lower than 1mb/s, it is definitely not my Internet speed but I think there is a routing problem if you are not from Europe, which causes slow download speeds.
Guest Posted December 19, 2017 Report Posted December 19, 2017 Driving a truck in the winter mod is fun.Actually I think there is a mod that should be in the summer. What is your opinion?
Guest Posted December 19, 2017 Report Posted December 19, 2017 How about updating Multi-player to support SCS ATS version 1.29.1.17 witch came out like over THREE WEEKS AGO now that y'all are done playing with your self's, getting Multi-player updated to support the two version updates SCS put out for it just in the passed four days and playing with this useless winter mod. It sucks it takes like three to four weeks for y'all to patch MP for one ATS version update and only like four days for y'all to get ETS2 patched up with the two version updates it had and also getting this winter mod working on it and that was all like a week after SCS put the one version update out for ATS.
MaxiTuX Posted December 19, 2017 Report Posted December 19, 2017 TL;DR : Make the plan. Execute the plan. Expect the plan to go off the rails. Throw away the plan 1 hour ago, Kat_pw said: If you’d REALLY like to see what the download server can do now that it’s not being abused with more than we planned, press F1 with the launcher open to clear out your local files and try to download it again! And now, you'll break their servers again by having 108k people downloading it from scratch once more.
Guest Posted December 19, 2017 Report Posted December 19, 2017 13 minutes ago, Nutty Wolf said: How about updating Multi-player to support SCS ATS version 1.29.1.17 witch came out like over THREE WEEKS AGO now that y'all are done playing with your self's, getting Multi-player updated to support the two version updates SCS put out for it just in the passed four days and playing with this useless winter mod. It sucks it takes like three to four weeks for y'all to patch MP for one ATS version update and only like four days for y'all to get ETS2 patched up with the two version updates it had and also getting this winter mod working on it and that was all like a week after SCS put the one version update out for ATS. I recommend you to read the news and be polite because if you think it's just a simple patch to support the new ats version, you might as well do it yourself. The devs are having a busy time and can't always promise a new update, they have been focusing on getting the winter mod and special cargo supported. Being a jerk about it will just get yourself in a worse position.
A Simple Cheeseburger Posted December 19, 2017 Report Posted December 19, 2017 1 hour ago, Kat_pw said: 15:49 UTC. While pouring water into my oatmeal Who has water and not milk in oatmeal? 1
LUIG Posted December 19, 2017 Report Posted December 19, 2017 Your work is always Good kat Thanks for bringing us this Great Winter mod
novice Posted December 19, 2017 Report Posted December 19, 2017 2 hours ago, Kat_pw said: If you’d REALLY like to see what the download server can do now that it’s not being abused with more than we planned, press F1 with the launcher open to clear out your local files and try to download it again! 2 1 2
pleox Posted December 19, 2017 Report Posted December 19, 2017 Just an idea, why not using some p2p for udpates ? It can really decrease the load on servers, just by making everyone uploading too I know some games like World of Tanks (and all games from this editor) use this.
Guest Posted December 19, 2017 Report Posted December 19, 2017 1 hour ago, explocraft said: Just an idea, why not using some p2p for udpates ? It can really decrease the load on servers, just by making everyone uploading too I know some games like World of Tanks (and all games from this editor) use this. You still need the brute force to do the initial seed with p2p, it also makes the updater significantly more complex, and error prone.
ALTUNSOY. Posted December 19, 2017 Report Posted December 19, 2017 thanks for everything CPU : AMD Ryzen 5 2600X GPU : MSI Geforce RTX 2060 MOTHERBOARD : ASUS TUF B450-PLUS GAMING MEMORY : ADATA XPG 16GB (2x8GB) 3200Mhz SSD : CORSAIR 120GB 550MB - 500MB SSD : SAMSUNG 250GB 860 550MB - 520MB SSD : ADATA 480GB SU630 520MB - 450MB CASE : Cooler Master MasterBox MB511 PSU : Thermaltake Smart 700W 80+
Martin. Posted December 19, 2017 Report Posted December 19, 2017 That was quite interesting to read. Thanks for everything you've done so far! Recruitment | Report a player | Feedback | Ban Appeal Guide | Rules | Server Status | Ban Appeal | Allowed Modifications
[MIB] Agent "F" Posted December 19, 2017 Report Posted December 19, 2017 My friend Kat, thanks for the apology but it is not needed. ALL the time, aggravation, no sleep, and devotion that you go thru constantly along with others on the Team is VERY much appreciated!!!! At least by me & I know several others that feel the same way. Like you said trial & error, not EVERY roll out is going to be perfect (at first) & that should be understood by anyone that has ever owned a computer & had load issues. My internet actually sux, but I still was able to process the update. But I also had the forethought to get in early!! I was # 5 on the server, 3 hours before the Express was to depart. I already knew there were going to be issues especially with this large of an Event. I do know others that notified me they had very slow process in the update, but that was because they waited until the last minute or later when the servers were popping like popcorn. Lol Don't worry about it, the Snow Mod is absolutely amazing except for the headlight 10 Second rule, which by the way has carried over to ATS as well but I assume that is for the future update. I also realize this is from 10gbps as a rule & not completely in your control. Thank you for taking the time to educate those unaware so that they may somehow understand just what you ALL go thru & the sleepless nights that you ALL devote to TMP as a whole. There will be those who still do not understand and think that the internet is suppose to work perfectly at all times, but at one time people thought the world was FLAT & it was hard to get them to wrap there head around the fact that it's round. So don't worry, we are here backing you bro. And for the ATS issues that people keep complaining about, I know will come around soon as well. Great job & again you works are not unappreciated by no means !! :<) 1
Davina Posted December 19, 2017 Report Posted December 19, 2017 (edited) 7 hours ago, A Simple Cheeseburger said: Who has water and not milk in oatmeal? It depends what sort it is - some porridge uses water I think. Anyway Kat thank you for the detailed story, very enlightening. I will be able to experience it fully from Thu (although I'd rather do it in-sync with cold weather IRL but we'll have to see if we get any more!) -Davina- Edited December 19, 2017 by DavinaETS
Recommended Posts