Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Options
View
Go to last post Go to first unread
Offline paul  
#1 Posted : Wednesday, November 19, 2014 10:45:29 AM(UTC)
News


Rank: Member

Reputation:

Groups: Registered
Joined: 9/23/2007(UTC)
Posts: 25,073

Was thanked: 3 time(s) in 3 post(s)
Remember that comment you posted to Twitter eight years ago? No? Well now you can look it up, along with every other public tweet made by anyone. In a blog post this week, Twitter announced that it now indexes every public tweet since 2006, which has been something the microblogging service has wanted to do for quite some time now.

"Since that first simple Tweet over eight years ago, hundreds of billions of Tweets have captured everyday human experiences and major historical events. Our search engine excelled at surfacing breaking news and events in real time, and our search index infrastructure reflected this strong emphasis on recency. But our long-standing goal has been to let people search through every Tweet ever published," Twitter said.

Twitter

Twitter provided a few examples of when this expanded search capability might prove useful, such as providing comprehensive results for entire TV and sports seasons or digging through long-lived hashtag conversations across various topics, such as #JapanEarthquake, #Election2012, and so forth.

While on the surface this may seem like no big deal, it took quite a bit of engineering savvy to make it happen. Twitter points out that its full index is more than 100 times larger than its real-time index and grows by several billion tweets a week. The real-time index is fully stored in RAM for fast updates, though using the same RAM technology for the full index would have been cost prohibitive. Twitter turned to SSDs instead, though it wasn't as simple as using a different storage medium.

"SSDs were still orders of magnitude slower than RAM. Switching from RAM to SSD, our Earlybird QPS capacity took a major hit. To increase serving capacity, we made multiple optimizations such as tuning kernel parameters to optimize SSD performance, packing multiple DocValues fields together to reduce SSD random access, loading frequently accessed fields directly in-process and more," Twitter explains.

The blog post is actually rather interesting if you're into geeky details, as there's a lot more to digest than the storage medium alone.
Users browsing this topic
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.