Getting data, huge amounts of data, out of some systems tends to be quite a hazzle sometimes. Often you are required to use techniques such as FTP or SSH for transfering files. But with RESTful APIs getting more attention in the last few years, there is a new way to get your data.
The charm of REST APIs is, that they are stateless and use HTTP methods explicitly. This makes getting data pretty straight forward:
- Use POST to create a resource on the server.
- Use GET to retrieve a resource.
- Use PUT to change a resource.
- User DELETE to remove a resource.
The result can be returned in any defined format, but mostly it is XML or JSON. Security is also provided, if you integrate authentification methods like OAUTH or LDAP.
This gives you new possibilities to integrate your data into webbased reporting systems, since you only have to use the HTTP protocol to get your data and can work on the results as they stream in.
Since most REST APIs have the possibility to store results of a request, you could get the same result again at a later time, without having to process it on the source system again.
Hadoop even provides a REST API called WebHDFS REST API developed by Hortonworks, which supports the complete filesystem interface of HDFS. This is a great help, if you are running applications using your Hadoop cluster that are not using Java. So you can mainpulate and access your data from about everywhere.