Finding and Fixing "Duplicate" Resource Entries

Submit ideas and suggestions on how we display, catalogue and export the resources.

Moderator: Forum Moderator

User avatar
Sobuno
Developer
Posts: 2589
Joined: Sun Mar 25, 2007 2:17 am
Contact:

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Sobuno » Thu Jul 22, 2010 11:34 am

Added the class to the page with checkboxes.

I don't understand your other request :?

User avatar
Zimoon
Forum Moderator
Posts: 4817
Joined: Mon May 14, 2007 6:55 am
Location: Stockholm, SE
Contact:

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Zimoon » Thu Jul 22, 2010 4:26 pm

Great ;)

Now they are sorted by name.

The optimal is to sort by resource class in the order they read in the resourcetree.xml, then by name -- by RC happens to be the same order as in the 30k resource deed (thanks to Rommel). That order makes it easier to walk them through, class by class.

I am quite sure you have something for this in the DB because at different pages resources or classes are sorted this way by default.

The page as such is awesome, I will do what I can over time.



Oooh, question:
Does it update so when a resource is done it disappears from the list? I would guess on a "yes", knowing how thorough you are :)



Please add the server=ID to the page we are supposed to edit the link, my memory is in bad shape at times. And no, it's not relates to beverages but perhaps a man's inherent right to be absentminded 8)



There is also an error if you edit the link but rashly click the submit button, which I di....tried.



This is a great addition to the site, I like it a lot :D


/Zimoon

User avatar
Sobuno
Developer
Posts: 2589
Joined: Sun Mar 25, 2007 2:17 am
Contact:

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Sobuno » Thu Jul 22, 2010 8:02 pm

Actually, I meant this request:
And add the ID and names to the index page too, perhaps.
And nope, they don't disappear or anything, pressing submit does nothing but give you the list of ID's, which you will then need to give to me. I'll then feed them into a script later that ensures they are properly removed (Properly removed = Removed in the same way as if you clicked "Remove" on all of them, ensuring proper tracking of when and how they were removed, as well as the ability to bring them back.)

It is not a permanent addition to the site either; the query took 10 hours to complete while using 100% of a 2.66 GHz processor the whole time. It would be possible to make it permanent by having each resource addition or resource edit do a look-up in the database for other resources with the same class and stats on the same server, that would be fairly straightforward, but it's not what's happening right now :)

User avatar
Zimoon
Forum Moderator
Posts: 4817
Joined: Mon May 14, 2007 6:55 am
Location: Stockholm, SE
Contact:

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Zimoon » Thu Jul 22, 2010 10:10 pm

OK, and it seems I missed one of your posts there, the one with the "manual".

Hmm, there is a risk of double work if they are not removed/disabled/marked/whatever at the page of names. But I can understand this kind of temporary feature is nothing to waste tons of developing time at.
It would be possible to make it permanent
Why?
Only for old resources you need to scan all of the galaxy.

For new resources I guess it is a good idea though, plus scan those that despawned very recently. Two ways maybe: name-lookalike filter, and same stats. ISDroid reports do not need name validation, but how to alert the community that somebody posted something misspelled just before the report is submitted.

My experience is that the following errors were most common at the old site:
- Spelling mistakes in resource name
- Selected wrong resource class at drop down
- Swapped digits in some stats
- Forgotten stat

All these are not easily addressed, and leaning oneself at a combination may reduce the result set badly. Perhaps just the name.

NN all, sweet dreams
Zimoon

User avatar
Sobuno
Developer
Posts: 2589
Joined: Sun Mar 25, 2007 2:17 am
Contact:

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Sobuno » Thu Jul 22, 2010 11:10 pm

It would be useful for old resources if someone inputted an old resource and it turned out that the resource that had been present in the database all along was named incorrectly. Other than that, no use for old resources, yes.

But it's not something I'll be implementing right now. I am also still considering the spelling mistake filter through the use of that levenshtein function I have previously mentioned (http://php.net/manual/en/function.levenshtein.php).

User avatar
Monty Burns
Master Crafter
Posts: 549
Joined: Sat Mar 08, 2008 9:26 am
Location: New Zealand

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Monty Burns » Fri Jul 23, 2010 8:35 am

So what the hell am I meant to do with this, do I email it to you or post it here?


463036,463112,463205,463375,463751,464386,463632,463669,463697,799390,463787,477149,463923,464108,464316,464666,464808,464873,464928,691419,686802,789397,465296,465315,465504,465678,465862,465741,740266,697322,466144,942770,466283,477532,466361,466366,466396,956734,466534,924155,466579,466582,466586,466605,466646,37413,466708,466745,466828,466923,467038,468999,467300,914353,775290,467579,

I messed around with the Sunrunner list and this is everything up to "D" but I am not sure whether it can be done in parts or whether I have to do the lot at once.
The biggest issue has been spelling mistakes but there are also a few where it appears some one has cut and paste the previous spawns stats into the next spawns table so they may have to be tidied up separately.

User avatar
Sobuno
Developer
Posts: 2589
Joined: Sun Mar 25, 2007 2:17 am
Contact:

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Sobuno » Fri Jul 23, 2010 12:06 pm

Posting it here is fine, it's just a list of ID numbers for resources that are going to be deleted anyway :)

You can do it in parts, I have deleted the ones you mentioned from the list (Not deleted them from the database yet though, it requires a bit more work) as well as any duplicate groups that only contained one resource.

User avatar
Zimoon
Forum Moderator
Posts: 4817
Joined: Mon May 14, 2007 6:55 am
Location: Stockholm, SE
Contact:

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Zimoon » Fri Jul 23, 2010 2:54 pm

Sobuno wrote:... I have deleted the ones you mentioned from the list (Not deleted them from the database yet though, it requires a bit more work) as well as any duplicate groups that only contained one resource.
Great job, S !!!

I will post a link at the RSG forum, let's see if we have some nerds there :)

/Zimoon

User avatar
Monty Burns
Master Crafter
Posts: 549
Joined: Sat Mar 08, 2008 9:26 am
Location: New Zealand

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Monty Burns » Fri Jul 23, 2010 5:51 pm

My next question is what do we do about resources not in the 30k Crate (Wind, Geothermal etc.) as there is no real way of checking their validity.

User avatar
Zimoon
Forum Moderator
Posts: 4817
Joined: Mon May 14, 2007 6:55 am
Location: Stockholm, SE
Contact:

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Zimoon » Fri Jul 23, 2010 10:49 pm

Monty Burns wrote:My next question is what do we do about resources not in the 30k Crate (Wind, Geothermal etc.) as there is no real way of checking their validity.
Then my suggestion is to just let them stay. Eventually a new resource may spawn with the error name and then we know :twisted:

Theonl

/Zimoon

User avatar
Monty Burns
Master Crafter
Posts: 549
Joined: Sat Mar 08, 2008 9:26 am
Location: New Zealand

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Monty Burns » Sat Jul 24, 2010 12:04 am

Yes but perhaps they should not show up in the list at all to prevent confusion?

User avatar
Zimoon
Forum Moderator
Posts: 4817
Joined: Mon May 14, 2007 6:55 am
Location: Stockholm, SE
Contact:

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Zimoon » Sat Jul 24, 2010 9:15 am

Sobuno wrote:It would be useful for old resources if someone inputted an old resource and it turned out that the resource that had been present in the database all along was named incorrectly. Other than that, no use for old resources, yes.

I agree. What impact on the server and the submit time would it have?
I assume there is no way to let the submission exit quickly but a background check raises an alert dialog somewhat later? Or are the web browser that smart these days? --- I haven't done much web-coding lately, nowadays it is chiefly C/C++, Python, and Java.


But it's not something I'll be implementing right now. I am also still considering the spelling mistake filter through the use of that levenshtein function I have previously mentioned (http://php.net/manual/en/function.levenshtein.php).
That would be useful and because it just has to cover the current resources it is also quick. A question nags me though, what about false positives and how to treat these.



Another approach, also quite dynamic, would be to run a daily cron job for everything that is reported that day that generate a public text file per galaxy for doubles. Let doubles from previous days stay until they are fixed. Link (or include) it at Current Resources and link also to this forum where players can report what is correct and what is wrong about "old" resources but they can edit the current resources if they need to.

There is a risk that less knowledgeable players do not remove but rather mark unavailable the resource they mistakenly added. That would be covered by such a nightly scan because it does not go away from the list.

It would also be possible to email players (real email) that have added a double and ask them to correct or remove the resource. Why email? Because I know by experience that many submitters never visit the forum and hence they never realize they have a PM in the Inbox.



For general information to those that follow this thread but do not realize the problems with error names:
  • An entry in the database with its name in error blocks future submits when a resource spawns with this name. Dies that happen? Yes! Every now and then.
  • A name entry makes a find//search for the real name to fail.
  • A name error may lead to doubles, which this thread is about, and months or years from now it is the harder to tell which is the right and which is the wrong one. Some resource classes are not read in the 30k resource crate.
The only way to submit error free names is by using ISDroids. Unless a player edits the mails before submission, and why should he, the names and the resource classes are correct. This is true both for submitting via the file upload feature here at SWGCraft or if you use SWGAide's support for ISDroids.

If you made a mistake and you realize that it caused a double, do not mark unavailable the resource that is in error, if you can you should use the remove button, otherwise you post in this Resources forum and the moderators will help you out.

/Zimoon

User avatar
Zimoon
Forum Moderator
Posts: 4817
Joined: Mon May 14, 2007 6:55 am
Location: Stockholm, SE
Contact:

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Zimoon » Sat Jul 24, 2010 9:24 am

Monty Burns wrote:Yes but perhaps they should not show up in the list at all to prevent confusion?
That is one way, to take away all entries under Energy -- that is the only resource class that is excluded, is it not?

Another way is to group them bottommost and add a note that "these cannot be checked in the 30k crate". Only players with existing stock may correct these entries.



I have an item at my SWGAide backlog for the inventory but am uncertain if it really would help:
"make feature to have inventory submit resource data that is unknown at SWGCraft"
These entries are definitely entered by hand to the inventory and may be in a less than great shape spelling-wise, possibly adding yet other doubles, so... But could it help? Not yet caffeinated so... 8)

/Zimoon

User avatar
Monty Burns
Master Crafter
Posts: 549
Joined: Sat Mar 08, 2008 9:26 am
Location: New Zealand

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Monty Burns » Sun Jul 25, 2010 12:39 am

Sobuno wrote:Posting it here is fine, it's just a list of ID numbers for resources that are going to be deleted anyway :)

You can do it in parts, I have deleted the ones you mentioned from the list (Not deleted them from the database yet though, it requires a bit more work) as well as any duplicate groups that only contained one resource.
Oky doky then, here is everything up to H done for Sunrunner...


471710,467508,467527,467595,482652,467740,467770,467848,467890,467893,467911,468019,468131,468472,468164,468326,704727,468538,789475,28444,468692,469041,468796,996336,468929,704326,469808,470000,801498,470283,470769,471579,470510,470595,470729,470739,470754,470788,471229,471448,1002930,773349,794090,

It is taking more time than I thought and would be a lot easier if things were ordered by resource type. :)

I have also gone back and fixed a couple that were copy and paste jobs of the previous resources stats with the new resource's name.

User avatar
Sobuno
Developer
Posts: 2589
Joined: Sun Mar 25, 2007 2:17 am
Contact:

Re: Finding and Fixing "Duplicate" Resource Entries

Post by Sobuno » Mon Jul 26, 2010 5:54 am

They should now be sorted like the 30k resource deed.

Never underestimate how confusing our sorting system is.... Took me 2 hours to implement this seemingly small change.

Post Reply

Who is online

Users browsing this forum: No registered users and 12 guests