In the previous tutorial I have shown How to Parse/Read XML Files using R Programming. In this class we will see How to Parse Website Source Code and fetch important information from it using R Programming Language. Parsing an HTML file helps in finding useful information about the website. In this class I am going to parse http://www.tutorialspoint.com/ and find out all the courses available on the website.
Read Also:
How to Parse Website Source Code
To parse website Source Code you need to install and load ‘XML’ package. To do so use the following code:
install.packages(‘XML’)
library(XML)
Once the package is loaded, you can use htmlTreeParse() function to get the source code of the webpage in R Object. Example shown below:
Output is shown below:
Hope you liked the article, Keep Reading Technokarak.com for more tutorials on R Programming.
Comment below if you face problem while practicing the above Tutorial on How to Parse Website Source Code.