Introduction to ParseHub
ParseHub is a web scraping tool that effectively scrapes data from the most outdated websites. Most web scrapers are unable to scrape data from interactive websites. However, ParseHub is a low-cost solution that can interact with AJAX and JavaScript pages.
In addition, this web scraping tool is easy to use as it does not require coding. Once you access the website, select the data you need, and you can extract it via JSON, Excel, and API. ParseHub is an incredibly flexible and powerful web scraping tool that leverages machine learning to understand the elements on a web page and extract the data in seconds.
However, you need to integrate proxies into ParseHub to ensure IP rotation that significantly reduces the incidence of IP bans. There are several free proxy providers in the market; however, choosing a reputable proxy provider ensures you enjoy unlimited web crawling.
Therefore, this guide will examine configuring NetNut Proxies on ParseHub for optimized web scraping activities.
Configuring NetNut Proxies on ParseHub
Step 1: Go to ParseHub website and select Download to install the app on your device.
Step 2: Select Signup on the ParseHub homepage to register
Step 3: Fill in the required details – Name, Email, and Password
Step 4: Once the app download has been completed, install and launch it on your device. Click on New Project
Step 5: Enter a URL to the website you want to extract data from. For example, if you want to collect data from NetNut’s website, you can type in https://netnut.io/. Click on Start Project on this URL to start web scraping
Step 6: Wait for the project to load and switch to Browser mode
Step 7: Click on the Advanced tab and select Network. Then click the Settings tab in front of the Connection option
Step 8: Select Manual proxy configuration. Fill the required fields and click Ok when you are done.
There are various proxy types so when integrating NetNut proxies, choose HTTP or SOCKS5 protocol
This is an example of a proxy string for a browser :
USERNAME-stc-uk-sid-123456789:PASSWORD@gw-am.ntnt.io:5959
-
Hostname Configuration
Copy the hostname/server address provided by NetNut
Example: Type gw-am.ntnt.io into the host field if you are using HTTP protocol. Alternatively, type gw-socks-am.ntnt.io for SOCKS5 protocol
-
Port number Configuration
The Port number for NetNut HTTP proxies is 5959 and 9595 for SOCKS5
-
Username Configuration
Username is your login, which you can find in your NetNut account in Settings -> Billing.
Proxy-type is the proxy type that you use. NetNut provides three different proxy types depending on your subscription plan. Your username should have three components including your user ID, type of proxy( residential, datacenter, static) and target country.
- dc — datacenter;
- res — rotating residential proxy;
- stc — static residential proxy.
Country is the country whose IP addresses will be used for connection. You can choose “Any,” in which case any available country will be used, or you can provide the ISO code of a specific country from the list of NetNut Available Countries: e.g., jp (Japan), fr (France).
Example: ticketing123-res-us
This is where you get the proxy username and password from the customer portal. You can also get in touch with your account manager if you’d like additional assistance.
-
Consistent IP session
While NetNut provides rotating IP addresses, you may want a static IP address. This can be useful when you want to maintain your session via the same IPs. Then you need to incorporate a session id (SID) with your username.
How do you choose a SID?
- Choose a number between 4 to 8 digits
- Ensure the numbers are random and non-sequential to protect your IP address
For example: ticketing123-stc-us-SID-435765
-
Proxy password
Insert the confidential NetNut proxy password
Step 9: Open your target website in a new browser tab. You will be required to provide your username and password. However, you can skip this step if you are using IP whitelisting as an authentication option.
Conclusion
Integrating NetNut proxies with ParseHub allows businesses to scrape data without worries of IP bans or geographical restrictions. Data has become increasingly significant to businesses so integrating NetNut proxy servers with a tool like ParseHub becomes crucial for effective web data extraction. In addition, NetNut has a large IP pool which ensures automated IP rotation for optimized anonymity and efficiency.
Check out other “how to articles” to learn how to integrate NetNut proxies with other tools.
Feel free to contact us if you have questions about choosing the best proxy solution for your needs.