Twitter API Integration with AEM Using Talend
Recently, I came across an interesting use case to fetch tweets from Twitter and post it to AEM instance. Since tweets content can be huge and this use case is similar to that of migration, involving extraction, transformation, and loading of the content, I decided to use an ETL tool called Talend.
Talend is a leading open source integration software provider. It has set of Twitter components pack based on twitter4j library which can help you connect to both twitter API (in case you want to gather past tweets to build a big dataset, though you can fetch tweets from last one week only at max, as mentioned in Twitter Developer Documentation) and twitter stream API (streams the live tweets).
We just need to define the queries in “tTwitterStreamInput” or “tTwitterInput” component based on our requirement of tweet source then we can post these result to AEM using HTTP post request.
Below are few simple steps that can be followed to fetch the tweets and post it to the AEM instance:
Step 1: Create your own twitter application as you can not use the APIs anonymously. Follow the instructions from this link to do the same.
Step 2: Install Twitter components pack in your Talend instance. These components are not available in Talend by default. You’ll have to download and install them manually. Follow the instructions from this link to install the components in Talend.
Step3: Restart your instance and create a new job. Drag tTwitterOAuth, tTwitterStreamInput , tJavaRow and tTwitterOAuthClose from the components palette. Connect them as per the below diagram:
Select the tTwitterOAuth component, which provides the connector facilities to authenticate against a Twitter App using Twitter OAuth authentication system and fills the fields with the strings you got from your Twitter App API keys page. In this context, API and Consumer are synonyms. You can choose the connection type here based on your requirement.
I have used tTwitterStreamInput which gives only structured data as an output. You can also use tTwitterInput component, which provides whole JSON response from twitter API.
Write your query in the tTwitterStreamInput component, create a schema and do the column mapping of Output.
You can limit the number of tweets you want in one job in this component.
Connect the output to the tJavaRow in which a custom code can be written to post the data to the AEM instance.
I had created a nt:unstructured node corresponding to each tweet in JCR through HTTP post request.
Alternatively, you can use tHttpRequest component to post the tweets to your AEM instance.
Select the connection to be closed in tTwitterOAuthClose component. It will close the connection on completion of sub-job.
Hope you find the blog helpful !!!
Hi,
The job was keep on running,
what should be the problem?
pls help me
Thank You
What limit have you set in Adavance setting tab of tTwitterStreamInput?
Job should get closed after reaching to that limit.