Parsing tweet for Hashtags, Usernames and URLs in Java
Hi,
As i am working on Twitter integration in my current project, i needed to display the searched tweets from twitter API, on my view layer. When we query the Twitter API, it returns tweets text in the form of simple string which contains HashTags, Twitter usernames and links to external resources.
While displaying them on the UI layer, i wanted to have the proper links for every element like:-
#. Hashtags should be linked to search of tweets containing similar tags on twitter.
#. Usernames should link to the profile of that user on twitter.
#. External URLs should be connected to the resource it is pointing to.
I searched a lot and found various ways which are really useful if you want to parse the tweet on the server side and thought it worth sharing.
Below is the method which i used to parse tweet and returns the final html which i can use directly on my UI.
[java]
String parse(String tweetText) {
// Search for URLs
if (tweetText && tweetText?.contains(‘http:’)) {
int indexOfHttp = tweetText.indexOf(‘http:’)
int endPoint = (tweetText.indexOf(‘ ‘, indexOfHttp) != -1) ? tweetText.indexOf(‘ ‘, indexOfHttp) : tweetText.length()
String url = tweetText.substring(indexOfHttp, endPoint)
String targetUrlHtml= "<a href=’${url}’ target=’_blank’>${url}</a>"
tweetText = tweetText.replace(url,targetUrlHtml )
}
String patternStr = "(?:\\s|\\A)[##]+([A-Za-z0-9-_]+)"
Pattern pattern = Pattern.compile(patternStr)
Matcher matcher = pattern.matcher(tweetText)
String result = "";
// Search for Hashtags
while (matcher.find()) {
result = matcher.group();
result = result.replace(" ", "");
String search = result.replace("#", "");
String searchHTML="<a href=’http://search.twitter.com/search?q=" + search + "’>" + result + "</a>"
tweetText = tweetText.replace(result,searchHTML);
}
// Search for Users
patternStr = "(?:\\s|\\A)[@]+([A-Za-z0-9-_]+)";
pattern = Pattern.compile(patternStr);
matcher = pattern.matcher(tweetText);
while (matcher.find()) {
result = matcher.group();
result = result.replace(" ", "");
String rawName = result.replace("@", "");
String userHTML="<a href=’http://twitter.com/${rawName}’>" + result + "</a>"
tweetText = tweetText.replace(result,userHTML);
}
return tweetText;
}
[/java]
The above code will return the tweet text wrapped in HTML elements to make it more UI friendly.
This worked for me.
Hope it helps.
Useful Links:-
https://dev.twitter.com/docs/tco-url-wrapper
http://stackoverflow.com/questions/8451846/actual-twitter-format-for-hashtags-not-your-regex-not-his-code-the-actual
http://www.simonwhatley.co.uk/parsing-twitter-usernames-hashtags-and-urls-with-javascript
I’ve done a different implementation Here my code 🙂
private String elaborateTwitterText(String text) {
String newText = text;
for (String key : new String[]{“#”, “@”, “http://”, “https://”}) {
int findIndex = 0;
int lastIndex = 0;
while (findIndex != -1) {
findIndex = text.indexOf(key, lastIndex);
lastIndex = findIndex;
if (findIndex != -1 && lastIndex < text.length()) {
String tag = null;
try {
tag = text.substring(findIndex, text.indexOf(' ', lastIndex));
} catch (StringIndexOutOfBoundsException e) {
// No ' ' found so substringing till the end of the string
tag = text.substring(findIndex);
}
switch (key) {
case "#":
newText = newText.replace(tag, "” + tag + “”);
break;
case “@”:
newText = newText.replace(tag, “” + tag + “”);
break;
default:
newText = newText.replace(tag, “” + tag + ““);
break;
}
}
lastIndex++;
}
}
log.info(“Elaborated text: “+newText);
return newText;
}
Excellent blog here! Also youur web site loaxs up fast! What web host are
you using? Cann I get your affiliate link to your host?
I wijsh my website loaded up ass quicly as yours lol
Hi, what is the status of this code ? Can I use it in my GNU side project ? Can I use it in my company code ? Thanks
Thanks works great. I only changed the ${rawName} to ” + rawName + ” and also for the url.