Hi!
I want to open a link and copy the contents of the website. I do not care if it is HTML code or if I just copy the contents with right click, the only thing I need is automation. I want to copy everything between String1 and String2 which are present on the website.
The structure of the website looks like:
[RANDOM CONTENT]
[STRING-1]
[WANTED CONTENT]
[STRING-2]
[RANDOM CONTENT]
Now I want to copy the WANTED CONTENT and save it into a string and add it to clipboard after some modifications.
How would I go around doing this? I know there is a way to create a browser with processing / java (allowing you to open html files) but yeah.
It sounds like you’re looking for something like Beautiful Soup? Except that’s for Python, so maybe a Java alternative, like jsoup. With these solutions, you don’t have to copy anything to the clipboard – the data you retrieve can feed directly into your program.
However, if you specifically want to operate your web browser as normal, but have the option to extract some specific text quickly, then you might use JavaScript. For example – you want to extract this highlighted text:
To do this using JavaScript:
- Open a browser tab (using Firefox/Chrome) and head to https://en.wikipedia.org/wiki/Lorem_ipsum.
- Press Ctrl+Shift+I to open the developer panel, and select the Console tab.
- Paste in the following code:
container_element = document.querySelectorAll('blockquote')[0].innerHTML
// starting phrase: "sed do eiusmod "
// ending phrase: " non proident, sunt"
regex = /(?<=sed do eiusmod )(.*)(?= non proident, sunt)/;
copy_this = container_element.match(regex)[0];
Defining the container_element
element part requires some understanding of the DOM. You can look up the querySelectorAll()
method for more info. Selecting text spanning several HTML elements could get tricky …
This will print everything between the starting and ending phrases in the console, which you can select and copy. To make this more efficient, so that you can use it across multiple web-pages whenever it’s convenient, you might use something like Greasemonkey to execute the code via a button you insert into the website interface.
The thing I am trying to do for myself is like this:
1, user triggers the program, together with an argument (to generate a link)
2, the program generates links for the websites
3, computer opens all of the links that have generated
4, the computer collects the wanted text from the website
5, the computer combines the text gathered from all links and fuses them together, along with some minor edits
6, the computer saves the gathered text into the clipboard, where the user is free to do whatever with it.
I want all of the but 1&6 to happen without user help.
I hope this makes it clearer. Thank you for the response!